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1. Introduction to Optimization 

The speed of a message-passing parallel code depends on the performance of both the local hosts and of 
the message passing environment. Optimization of parallel code is usually carried out in an iterative 
process involving several tools to investigate performance issues. Many of the computational 
optimizations are no different from the ones needed for a serial code. For more information, see 
Performance Basics , Single Processor Performance Tools , Single Processor Performance Considerations 
for the SP2 , and Timing and Optimizing a Fortran Program . 

This module will be concerned with issues that range from gathering profile data that will help with the 
design of a message passing scheme for your application, through debugging tHe code that you have 
written, to tracing what actually happened during execution. After each section you will have the option 
of trying out one or more of the tools that have been discussed. 

Overviews of parallel tools at CTC are available under CTCs web site. 
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For the purpose of this module, we will define a profiling tool as any tool that reports the cumulative 
time spent in various parts of a program over the length of a run. A more restrictive definition would 
limit profilers to tools that obtain timing information by sampling. We will also consider tools that 
obtain information by embedding timing routines. 

Profiling is important in parallel code optimization because the performance of a message passing code 
is closely related to its granularity, defined here as the ratio of the time between communication events 
to the duration of an event. To minimize time spent communicating, you should maximize your code's 
granularity by parallelizing at the highest feasible level. Save yourself the trouble of implementing a 
parallelization strategy that will result in too fine a granularity by discovering from a profiling run that 
the time T between message passing events is quite short. A good rule of thumb is that T should be 
greater than ten times the latency for sending a message. 

For HPF codes, profilers should also be used in conjunction with compiler parallelization reports to 
determine the effect of adding HPF directives to your code. You need to determine whether you've 
coded correctly, and whether the compiler is handling the directives as you expected. 



Tool 


Description 


Source 


Status 


Xprofiler 


Subroutines and statements, 

Any code compiled with IBM XL compilers 


IBM 

prototype 


Working, 
Comm. not 
accurately 
attributed 


Forge Profiling 


HPF subroutines and loops, 
Breakdown of communication overhead 


APR 


Working 


Pgprof 


HPF subroutines and statements, 
Per process or summarized 


Portland Group 


Working 


Time Functions 


Greatest control, 
Any code 


Varies 


Working 




Try Pgprof 



Mmm$mM nil BB— 



3. Debugging 

To debug a message-passing code, you can use IBM's parallel debugger or a serial debugger (if you are 
not running very many processes). All provide the standard serial debugging actions. All require the -g 
flag at compile time, to associate the source code you wrote with the assembly language code. 

You cannot use these debuggers directly on HPF code since, at some stage, the compiler translates the 
code to message-passing. If your compiler allows the intermediate message-passing code to be saved, 
and you are desperate, it is possible to use a debugger on this code. The generated code is usually not 
very readable, variables are often renamed, and loop indices re-formulated. 
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Tool 


Description 


Source 


Status 


Pdbx/Pedb 


Command-line or graphical interface, 
One or multiple windows, 
Parallel task manipulation 


IBM PE 


Working 




Xldb 


Serial debugger, 
Graphical interface, 
One window per task 


IBM XL 
compilers 


Working 


Dbx 


Serial debugger, 
Command line interface, 
One session per task, 
Can diagnose deadlock 


UNIX 


Working 




4. Tracing 

It is from a trace that you usually get the most information about how well your parallel job is doing. 
Here that you can find out exactly when tasks stall because they are waiting for messages. Tracing 
involves (at least) two steps: generating the trace and viewing the results. 

At runtime, trace records are written when message-passing library routines are called or at set time 
intervals. The trace records contain the type of event and a timestamp. After the run completes, you can 
use a tool to graphically display and summarize this data. 

All the tools listed below have full functionality on message-passing programs. NTV, Vampir, VT, and 
Pablo will also work for HPF programs. To use source association (the tool's ability to point to the 
location in the source code where a trace record was generated) on a HPF code with NTV and VT, the 
program must be compiled with xlhpf or xlhpf90. 



Tool 


Description 


Source 


Status 


NTV 


Static timeline for complete run, 
Communication summaries, 
Source association, 
Easy to use/learn 


NASA 


Working 


Vampir 


Static or animated displays, 

Timeline, node usage, information on messages, 

Smaller trace files 


Pallas GmbH 


Working 


VT 


Many communication and system displays, 

Animated displays, 

Source association on current event 


IBM PE 


Working 


UTE 
Nupshot 


Timeline, function summaries, efficiency, 
Minimum overhead for trace generation, 
Source association, 
Follow other AIX events 


IBM, 
Argonne 
Nat'l Lab 


Not 

Working 


Pablo 


Can construct trace analysis routines, 
Many different message-passing libraries 


University 
of Illinois 


Working 
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Try NTV P mnrr^ Try Vampir IBSffi^ Try VT 

Pablo Tutorial 1 BB Pablo Tutorial 2 



4.1 Converting Trace File Formats 

Each of the trace display tools requires that the trace file be in a different format. VT requires .trc, 
nupshot needs .ups, and Pablo wants .sddf. If you have done your own profiling (see below), you may 
have a trace file in alog format. Fortunately, there are three translators available for making conversions. 

alog2ups converts files from alog (old upshot format) into .ups (nupshot format). 

mp2sddf converts files from VT format (.trc) into SDDF format. 

ute2sddf converts files from UTE format (.ups) into SDDF format. 

An additional advantage of the SDDF format is that it is ASCII text, so it can be browsed with an editor 
if you are trying to track down a tracing problem or want to see a specific event record. 



4.2 Profiling Combined with Tracing 

Profilers and tracing tools "instrument" your code for you. If you don't feel that these are displaying the 
information you are most interested in, in the most usable form, you could consider adding your own 
profiling calls to your program. 



Tool 


Description 


Source 


Status 


Vampir 


MPI code, 

Can show duration of state 


Pallas GmbH 


Working 


NTV/VT 


MPI/MPL code, 

Can show location of event 


IBM PE 


Working 


Alog/Nupshot 


Any message passing library, 
Can show duration of state 


Argonne 
Nat'l Lab 


Not working 


UTE Markers 


MPI code, 

Can show duration of state 


IBM 


Not working 



\MMlMM — awn! 



Take a multiple-choice quiz on this material, and submit it for grading. 
I^SsS Access all profiling, debugging, and tracing exercises. 

Please complete this short evaluation form. Thank you! 
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VT 



Using VT 



VT is the portion of the IBM Parallel Environment devoted to trace visualization and performance 
monitoring. The IBM Parallel Environment for AIX: Operation and Use Version 2.1.0 manual 
(GC2 3-389 1-00) devotes 80 pages to VT. Thus, only a few of the highlights are mentioned here. 

A VT trace file is produced by specifying a nonzero value for the environment variable mp tracelevel. 
Recognized values and their consequences are: 

1 Markers only 

2 Markers and kernel statistics 

3 Markers, messages, collective communication 
9 Everything 

The trace file is automatically named by appending . trc to the name of the executable. 

Trace files contain a lot of information, so they tend to grow very rapidly. If message passing traffic is 
high, trace files can be as large as 0.1 MB/node/second. To limit the amount of trace information, you 
may insert calls to routines vt_trc_start (level, error) and vt_trc_stop (error ) in your code. 
Also, care must be taken that both the temporary and merged trace files are written in directories with 
sufficient space. It is best to set MP_TMPDiR=/tmp/scratch/<username> and, for a batch job, use 

MP_TRACEDIR=/sptmp/ trace . 

To view a trace file {a.outXro) that you have already produced, invoke VT with vt -tf ile a . out.trc. 
You will get a view selection window that gives you many options. If you traced everything 
(MP_TRACELEVEL=9 when you collected the trace), then both system and user events will be 
available for viewing. 



VT Screen Dumps 
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This is the VT control panel. Most VT views are animated; VT attempts to draw them at the same speed 
that the original program ran. Some are instantaneous (you only view the current state); some are 
streaming (the past state is left on the display). The controls allow the program to be run continuously or 
stepped-through (note the VCR-like buttons), the view to be magnified, and start and endpoints for 
playback to be selected. 




The user can select any number of views of the trace information, although the playback will slow down 
considerably if many views are opened. This is the "Interprocessor Communication" display, a 
color-coded timeline that shows what message-passing state (if any) each task was in. Arrows are drawn 
for communication events. You can use a search menu to locate where an event of interest occurred. 



If you left-click on the "Interprocessor Communication" display, a popup window gives the time and the 
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state the task was in. The time scale for the display is not labeled. It is also not uniform unless the 
colored bars are solid. Bars that are shown hatched are not as long as they should be, given the 
magnification chosen. 



The "System Summary" display shows the amount of processor time spent running the program (in 
green), running the operating system (in blue), waiting for resources (in yellow), and idle (none shown 
here). Usages averaged over a very short interval by default. If you want to see it averaged over the 
trace up to the current-event, right-click in the display and select show cumulative. 

The "User Load Balance" display shows the instantaneous and average cpu percentage for the program. 
Instantaneous is 0 (all yellow) in this example, but the hashed green polygon indicates the average cpu 
percentage. 





IH^s User Load Balance =fii| 
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spilled to memory. Scheduling of blocking instructions 'is improved by 
pre-allocating space in the scheduling reservation table. Improved 
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ABSTRACT EP 631252 A2 

A draw processor for a graphics accelerator is disclosed that performs 
edgewalking and scan interpolation functions to render a three 
dimensional geometry object defined by a draw packet. The draw processor 
renders a subset of pixels on a scan line, such that a set draw 
processors taken together render the entire geometry object. The draw 
processor renders pixels into an interleave bank of a multiple bank 
interleaved frame buffer. The draw processor also processes direct port 
data through a direct port pipeline . (see image in original document) 
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Floating-point processor for a high performance three dimensional graphics 
accelerator 
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Graphikbeschleuniger 
Processeur a virgule flottante pour un accelerateur graphique 
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ABSTRACT EP 627682 Al 

A floating-point processor for a high performance three dimensional 
graphics accelerator in a computer system is disclosed. The 
floating-point processor implements specialized graphics micro 
instructions. The specialized graphics micro instructions include a swap 
micro instruction which causes a hardware remapping of general purpose 
register groups to sort triangle vertices. The specialized graphics micro 
instructions also include specialized conditional branches for three 
dimensional geometry, (see image in original document) 

ABSTRACT WORD COUNT: 70 

LEGAL STATUS (Type, Pub Date, Kind, Text): 
Application: 941207 Al Published application (Alwith Search Report 

;A2without Search Report) 
950628 Al Date of filing of request for examination: 
950501 

980121 Al Date of despatch of first examination report: 
971205 

990317 Al International patent classification (change) 
990317 Al Obligatory supplementary classification 
(change) 

Grant: 990526 Bl Granted patent 

LANGUAGE ( Publication, Procedural , Application ) : English; English; English 
FULLTEXT AVAILABILITY: 



Examination : 

Examination : 

Change : 
Change : 



Available Text 


Language 


Update 


Word i 




CLAIMS B 


(English) 


9921 


1439 




CLAIMS B 


(German) 


9921 


1195 




CLAIMS B 


(French) 


9921 


1749 




SPEC B 


(English) 


9921 


7619 


Total 


word count 


- document 


A 


0 


Total 


word count 


- document 


B 


12002 


Total 


word count 


- documents A + B 


12002 



27/5/6 



(Item 6 from file: 348) 



12 May 10, 2000 10:32 



Ginger Roberts - Search Report 



DIALOG (R) File 348: European Patents 

(c) 2000 European Patent Office. All rts. reserv. 



00605943 

ORDER fax of complete patent from Dialog SourceOne. See HELP ORDER 348 

A parallel scalable internetworking unit architecture. 
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ABSTRACT EP 604341 A2 

An parallel scalable internetworking unit (IWU) architecture employing 
at least two network controllers (PMI), a foreground buffer controller 
(FGAM) with local memory, a background buffer controller (BGAM) with 
local memory, a node processor (NP) and a buffer memory. Each network 
attached to the IWU has an individual PMI which communicates with the 
FGAM. The FGAM interfaces with PMIs and maintains queueing information. 
The BGAM communicates with the FGAM for maintaining packets of data as 
linked lists of buffers in the buffer memory. The NP communicates with 
both the FGAM and the BGAM to process stored header information. And, a 
connection matrix is provided to dynamically interconnect multiple IWUs 
for increased parallel processing of packet traffic and processing. (see 
image in original document) 
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ABSTRACT EP 595453 Al 

A distributed data processing system includes a distributed resource 
manager which detects dependencies between transactions caused by 
conflicting lock requests. A distributed transaction manager stores a 
wait-for graph with nodes representing transactions and edges the nodes 
and representing dependencies between the transactions. Each edge is 
labelled with the identities of the lock requests that caused the 
dependency. The distributed transaction manager propagates probes through 
the wait-for graph, to detect cyclic dependencies, indicating deadlock. A 
deadlock message is then sent to the resource manager identifying a 
particular lock request as a victim for detection to resolve the 
deadlock, (see image in original document) 
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ABSTRACT EP 524683 Al 

A multiprocessor data processing system (10), and a method of operating 
same, so as to provide efficient bandwidth utilization of shared system 
resources (24, 26) . The system includes a plurality of processor nodes, 
each of which includes a data processor (22a, 28a) . In accordance with a 
method of the invention a first step buffers data written by a data 
processor to a first bus (23a), prior to the data being transmitted to a 
second bus (32) . A second step also buffers byte enable (BE) signals 
generated by the data processor in conjunction with the data written by 
the data processor. A third step performs a main memory (26) write 
operation by the steps of: transmitting the buffered data to the second 
bus; responsive to the stored BE signals, also transmitting a control 
signal for indicating if a memory write is to be accomplished as a 
read-modif y-write (RMW) type of memory operation; and transmitting the 
stored BE signals to the second bus. A further step couples the data, the 
RMW signal, and the BE signals from the local bus to a third bus (24) for 
reception by the main memory. Interface circuitry (34) associated with 
the main memory is responsive to the RMW signal for (a) reading data from 
a specified location within the main memory, (b) selectively merging the 
transmitted data in accordance with the BE signals, and (c) storing the 
previously read and merged data back into the specified location, (see 
image in original document) 
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DISTRIBUTED DATABASE SYSTEMS'; 

ABSTRACT EP 501025 A2 

A wait depth limited concurrency control method for use in a multi-user 
data processing environment restricts the depth of the waiting tree to a 
predetermined depth, taking into account the progress made by 
transactions in conflict resolution. In the preferred embodiment for a 
centralized transaction processing system, the waiting depth is limited 
to one. Transaction specific information represented by a real-valued 
function L, where for each transaction T in the system at any instant in 
time L(T) provides a measure of the current "length" of the transaction, 
is used to determine which transaction is to be restarted in case of a 
conflict between transactions resulting in a wait tree depth exceeding 
the predetermined depth. L(T) may be the number of locks currently held 
by a transaction T, the maximum of the number of locks held by any 
incarnation of transaction T, including the current one, or the sum of 
the number of locks held by each incarnation of transaction T up to the 
current one. In a distributed transaction processing system, L(T) is 
based on time wherein each global transaction is assigned a starting 
time, and this starting time is included in the startup message for each 
subtransaction, so that the starting time of global transaction is 
locally known at any node executing one of its subtransactions . (see 
image in original document) 
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AMSTERDAM, NL pages 283 - 2 93 T.C.OPPE ET AL 'AN OVERVIEW OF NSPCG: A 
NONSYMMETRIC PRECONDITIONED CONJUGATE GRADIENT PACKAGED- 
ABSTRACT EP 461608 A2 

In order to solve a symmetric linear system given by Au = b, where A 
represents a symmetric coefficient matrix equal to a three-term sum of a 
diagonal matrix plus an upper triangular matrix plus a lower triangular 
matrix, b represents a right-hand side vector, and u represents a 
solution vector, a device calculates the solution vector by using the 
right-hand side vector, . the diagonal matrix, and the upper triangular 
matrix. Supplied with an array (JA) representing a column number of the 
upper triangular matrix, a pointer array constructing section (11) 
constructs a pointer array (JL) which points to the lower triangular 
matrix. Supplied with an array (AA) and the array (JA) which collectively 
represent a combination of the diagonal matrix and the upper triangular 
matrix, a matrix decomposing section (12) decomposes the array (AA) into 
an approximate matrix (M) which approximates the symmetric coefficient 
matrix. A first product calculating section (16) calculates a first 
product vector (y) by using the array (AA) , the array ( JA) , a first 
vector (x) , and the pointer array (JL) . A second product calculating 
section (17) calculates a second product vector (y f ) by using the 
approximate matrix (M) and the second vector (x 1 ). An iterative 
calculating section (18) carries out iterative calculation on the array 
(AA) , the array ( JA) , an array (B) representing the right-hand side 
vector, the first and the second product vectors (y) and (y' ) . The 
iterative calculating section (18) iteratively provides the first and the 
second product calculating sections (16) and (17) with the first and the 
second vectors (x) and (x 1 ), respectively. The iterative calculating 
section (18) produces the solution vector (u) . The device may be supplied 
with the lower triangular matrix instead of the upper triangular matrix, 
(see image in original document) 
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IEEE FIRST INTERNATIONAL CONFERENCE ON NEURAL NETWORKS vol. 3, 21 June 
1987, SAN DIEGO , USA pages 191 - 198 SOMANI 'Compact neural network'; 
NOTE: 

No A-document published by EPO 
LEGAL STATUS (Type, Pub Date, Kind, Text) : 
Application: 920318 Al Published application (Alwith Search Report 

;A2without Search Report) 
Examination: 920318 Al Date of filing of request for examination: 

911224 

Search Report: 930602 Al Drawing up of a supplementary European search 

report: 930415 
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Withdrawal: 930915 Al Date on which the European patent application 

was withdrawn: 930630 
LANGUAGE ( Publication, Procedural , Application) : English; English; English 



27/5/12 (Item 12 from file: 348) 

DIALOG (R) File 348: European Patents 

(c) 2000 European Patent Office. All rts. reserv. 

00436196 

ORDER fax of complete patent from Dialog SourceOne. See HELP ORDER 34 8 

Processor array system. 

Feldrechnersystem . 

Systeme de processeurs en reseau. 

PATENT ASSIGNEE: 

AMT (HOLDINGS) LIMITED, (1014030), 65 Suttons Park Avenue, Reading 
Berkshire RG6 1AZ, (GB) , (applicant designated states: 
AT ; BE ; CH ; DE ; DK; ES ; FR; GB ; GR; IT ; LI ; LU ; NL ; SE ) 
INVENTOR: 

Hunt, David John, 3 Moores Green, Wokingham, Berkshire, RG11 1QG, (GB) 
LEGAL REPRESENTATIVE: 

Rackham, Stephen Neil et al (35061), GILL JENNINGS & EVERY 53-64 Chancery 
Lane, London WC2A 1HN, (GB) 
PATENT (CC, No, Kind, Date): EP 428327 Al 910522 (Basic) 
APPLICATION (CC, No, Date) : EP 90312204 901108; 
PRIORITY (CC, No, Date) : GB 8925721 891114 

DESIGNATED STATES: AT; BE; CH; DE; DK; ES; FR; GB; GR; IT; LI; LU; NL; SE 

INTERNATIONAL PATENT CLASS: G06F-015/80; 

CITED PATENTS (EP A) : EP 191280 A; US 4144566 A 

ABSTRACT EP 428327 Al 

A processor array employs an SIMD architecture and includes a number of 
sub-arrays (S1...S4). Each sub-array (S1...S4) includes n processor 
elements (PE) . Each processor element is connected to local store 
including on-chip memory. Each sub-array is connected to a region of 
off-chip memory by an m-bit wide path, where m is an integer greater than 
1. The m-bit wide path is selectively configurable as a one-bit path to 
or from each of m processor elements or as an m-bit wide path arranged to 
communicate complete m-bit words of memory data between the region of 
off-chip memory and respective processor elements. 

ABSTRACT WORD COUNT: 104 

LEGAL STATUS (Type, Pub Date, Kind, Text) : 
Application: 910522 Al Published application (Alwith Search Report 

;A2without Search Report) 
Examination: 920102 Al Date of filing of request for examination: 

911108 

Examination: 940720 Al Date of despatch of first examination report: 

940606 

Withdrawal: 950412 Al Date on which the European patent application 

was deemed to be withdrawn: 941018 
LANGUAGE ( Publication, Procedural , Application) : English; English; English 
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Parametric curve evaluation for a computer graphics display system. 
Parametrische Kurvenabschatzung fur graphisches Anzeigesystem mit Rechner. 
Evaluation de courbe parametrique pour systeme d'affichage graphique a 
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calculateur . 

PATENT ASSIGNEE : 

International Business Machines Corporation, (200120) , Old Orchard Road, 
Armonk, N.Y. 10504, (US), (applicant designated states: DE; FR; GB; IT) 
INVENTOR: 

Luken, William Louis, Jr., 2 Orchard Hill Road, Ulster Park, New York 

12487, (US) 
LEGAL REPRESENTATIVE: 

Blakemore, Frederick Norman et al (28381), IBM United Kingdom Limited 

Intellectual Property Department Hursley Park, Winchester Hampshire 

S021 2JN, (GB) 

PATENT (CC, No, Kind, Date) : EP 425174 A2 910502 (Basic) 

EP 425174 A3 921007 
APPLICATION (CC, No, Date) : EP 90311369 901017; 
PRIORITY (CC, No, Date) : US 426912 891024 
DESIGNATED STATES: DE; FR; GB; IT 
INTERNATIONAL PATENT CLASS: G06F-015/353 ; 

CITED PATENTS (EP A) : EP 277832 A; EP 314335 A; US 4760548 A 
CITED REFERENCES (EP A) : 

COMPUTER AIDED DESIGN vol. 19, no. 9, November 1987, LONDON pages 4 85 - 

4 98; L. PIEGL ET AL.: 1 CURVE AND SURFACE CONSTRUCTIONS USING RATIONAL 

B-SPLINES* ; 

ABSTRACT EP 42517 4 A2 

A method and apparatus are described for evaluating and rendering 
parametric curves. The apparatus includes a system memory connected to a 
pipelined arrangement of a graphics control processor, a plurality of 
parallel floating point processors, another floating point processor, a 
clipping processor and a frame buffer. The method includes: organizing 
and storing of NURBS data in system memory as a sequence of data records 
such that successive spans of a parametric curve of order k are defined 
by successive individual data records in conjunction with the immediately 
preceding 2k-3 prior data records; transforming the control points from 
modelling coordinates to view coordinates (x,y,z); multiplying the 
transformed control point coordinates by a weight yielding wx, wy, wz, w; 
simultaneously within each parallel floating point processor evaluating 
the b-spline functions for one component of the coordinate set (wx, wy, 
wz, w) for determined parameter points; eliminating the weight from wx, 
wy, wz, w yielding geometric coordinates x, y, z for points on the curve, 
clipping the geometric coordinates to the current viewing boundaries and 
drawing the clipped vectors as straight line segments on a screen of a 
computer graphics display system. (see image in original document) 

ABSTRACT WORD COUNT: 196 

LEGAL STATUS (Type, Pub Date, Kind, Text) : 



Application : 


910502 


A2 


Published application (Alwith Search Report 
;A2without Search Report) 


Examination: 


910502 


A2 


Date of filing of request for examination: 
901213 


Change : 


910918 


A2 


Representative (change) 


Search Report: 


921007 


A3 


Separate publication of the European or 
International search report 


Change : 


921216 


A2 


Representative (change) 


Examination: 


951206 


A2 


Date of despatch of first examination report: 
951019 


Withdrawal : 


971112 


A2 


Date on which the European patent application 



was deemed to be withdrawn: 970522 
LANGUAGE ( Publication, Procedural , Application ) : English; English; English 
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(c) 2000 European Patent Office. All rts. reserv. 
00389650 

ORDER fax of complete patent from Dialog SourceOne. See HELP ORDER 34 8 
Parallel processor structure for the implementation and learning of 

artificial neuronal networks. 
Parallelrechnerstruktur zum Modellieren und Trainieren kunstlicher 

Neuronaler Netze. 

Structure de processeurs parallele pour la realisation et 1 1 apprentissage 
de reseaux neuronaux artificiels. 

PATENT ASSIGNEE: 

Bodenseewerk Geratetechnik GmbH, (435830), Alte Nussdorfer Strasse 15 
Postfach 1120, D-7770 Uberlingen/Bodensee, (DE), (applicant designated 
states : BE; DE; FR; GB; NL) 
INVENTOR: 

Hausing, Michael, Dr.-Ing., Strandweg 29 A, D-7770 Uberlingen, (DE) 
Hesse, HansKlaus, Dr.-Ing., Im Gehren 20, D-7770 Uberlingen, (DE) 
LEGAL REPRESENTATIVE: 

Weisse, Jurgen, Dipl.-Phys. et al (12901), Bokenbusch 41 Postfach 11 03 
86, D-5620 Velbert 11-Langenberg, (DE) 
PATENT (CC, No, Kind, Date) : EP 388806 A2 900926 (Basic) 

EP 388806 A3 920108 
APPLICATION (CC, No, Date) : EP 90104969 900316; 
PRIORITY (CC, No, Date) : DE 3909153 890321 
DESIGNATED STATES: BE; DE; FR; GB; NL 
INTERNATIONAL PATENT CLASS: G06F-015/80; 
CITED PATENTS (EP A) : EP 377221 A 
CITED REFERENCES (EP A) : 

THE COMPUTER JOURNAL. Bd . 30, Nr. 5, Oktober 

1987, LONDON GB Seiten 413 - 419; FORREST: 'Implementing neural network 
models on parallel computers 1 
IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS 87 Bd. 2, 7. Juni 1987, 
SEATTLE, USA Seiten 853 - 857; CIOFFI : f A pipelined fast QR-RLS 
structure for high-speed VLSI implementation of adaptive filters 1 
PROCEEDINGS OF THE 2 6TH IEEE CONFERENCE ON DECISION AND CONTROL Bd. 

2, 9. Dezember 1987, LOS ANGELES, USA Seiten 1461 - 1467; KUNG: 
'Systolic designs for state space models : Kalman filtering and neural 
network 1 ; 

ABSTRACT EP 388806 A2 (Translated) 

A parallel processor structure for modelling and training artificial 
neuronal networks is connected to a host computer and constructed as 
two-dimensional matrix of simple identical processor elements. The 
processor elements are supplied with a command stream by a sequencer in 
accordance with the SIMD principle. The processor elements arranged on 
the diagonals of the matrix are allocated to the nodes of the neuronal 
network and intended for carrying out the neuronal functions. The 
non-diagonal processor elements handle the logic combinations between the 
nodes and are selected for the function of the variable synaptic 
weightings. The matrix firstly has a local neighbourhood networking to 
the four next neighbouring processors in each case. In addition, lines 
come from the neuronal processors, separated in x and y direction, which 
drive the non-diagonal synapse processors in parallel. In one direction, 
these lines are used for accelerating the distribution of the calculation 
results of the neuronal processors to the synapse processors. In the 
other direction, the lines are used for the accelerated distribution of 
correction data during the learning process. 

TRANSLATED ABSTRACT WORD COUNT: 177 

ABSTRACT EP 388806 A2 

Eine Parallelrechnerstruktur zum Modellieren und Trainieren kunstlicher 
Neuronaler Netze ist an einen Host-Rechner angeschlossen und als 
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zweidimensionale Matrix aus einfachen, identischen Prozessorelementen 
ausgebildet. Die Prozessorelemente werden nach dem SIMD-Prinzip von einem 
Sequencer mit einem Bef ehlsstrom versorgt. Die auf der Diagonalen der 
Matrix angeordneten Prozessorelemente werden den Knoten des Neuronalen 
Netzes zugeordnet und zur Durchfuhrung der Neuronenf unktionen bestimmt. 
Die nichtdiagonalen Prozessorelemente ubernehmen die Verknupf ungen 
zwischen den Knoten und werden fur die Funktion der veranderbaren 
synaptischen Gewichtungen bestimmt. Die Matrix besitzt erstens eine 
lokale Nachbarschaf tsvernetzung zu den jeweils vier nachsten 
Nachbarprozessoren. Daruber hinaus gehen von den Neuronenprozessoren, 
getrennt nach x- und y-Richtung Leitungen aus, welche die 
nicht-diagonalen Synapsenprozessoren parallel ansteuern. In der einen 
Richtung dienen diese Leitungen der Beschleunigung der Verteilung der 
Berechnungsergebnisse der Neuronenprozessoren an die Synapsenprozessoren. 
In der anderen Richtung dienen die Leitungen der beschleunigten 
Verteilung der Korrekturdaten wahrend des Trainings. 
ABSTRACT WORD COUNT: 145 



LEGAL STATUS 



(Type, Pub Date, Kind, Text) : 

900926 A2 Published application (Alwith Search Report 

;A2without Search Report) 
920108 A3 Separate publication of the European or 

International search report 
920826 A2 Date of filing of request for examination: 
920620 

930421 A2 Date of despatch of first examination report: 
930304 

940105 A2 Date on which the European patent application 
was deemed to be withdrawn: 930715 
LANGUAGE ( Publication, Procedural , Application) : German; 
FULLTEXT AVAILABILITY: 



Application: 
Search Report: 
Examination : 
Examination: 
Withdrawal : 



German; German 
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Remote boot 

Fern-Urlader 

Chargement initial a distance 

PATENT ASSIGNEE: 

DIGITAL EQUIPMENT CORPORATION, (313081), 111 Powdermill Road, Maynard 
Massachusetts 01754-1418, (US), (applicant designated states: 
DE;FR;GB;NL) 
INVENTOR: 

Flaherty, James E., 168 White Pond Road, Hudson Massachusetts 01749, (US) 
LEGAL REPRESENTATIVE: 

Goodman, Christopher et al (31122), Eric Potter & Clarkson St. Mary's 
Court St. Mary's Gate, Nottingham NG1 1LE, (GB) 
PATENT (CC, No, Kind, Date) : EP 358292 A2 900314 (Basic) 

EP 358292 A3 900829 
EP 358292 Bl 970910 
APPLICATION (CC, No, Date) : EP 89302132 890303; 
PRIORITY (CC, No, Date) : US 240955 880906 
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DESIGNATED STATES: DE; FR; GB; NL 

INTERNATIONAL PATENT CLASS: G06F-009/4 4 5 ; G06F-015/16; G06F-009/4 4 ; 

ABSTRACT EP 358292 A2 

A system and method of down loading, over a network, operating systems 
or other executable programs to a computer which does not have a boot 
device or other device containing the executable program. Down loading is 
accomplished without modification of the loadable image. The computer has^ 
a network interface which requests a minimum-boot program be transferred " 
from a host computer on the network. The minimum-boot program, when 
executed, establishes a logical connection to a disk server on the 
network and allows the requesting computer to treat the disk server as a 
local boot device. 

ABSTRACT WORD COUNT: 98 



LEGAL STATUS (Type, Pub Date, Kind, Text) 
Application : 



900314 A2 Published application (Alwith Search Report 

;A2without Search Report) 
900314 A2 Date of filing of request for examination: 
890316 

900829 A3 Separate publication of the European or 

International search report 
941214 A2 Date of despatch of first examination report: 
941028 

970910 Bl Granted patent 
980902 Bl No opposition filed 
LANGUAGE ( Publication , Procedural , Application ) : English; English; English 
FULLTEXT AVAILABILITY: 



Examination : 

Search Report : 

Examination : 

Grant : 
Oppn None: 



Available Text 
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CLAIMS B 
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SPEC B 
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HIGH PERFORMANCE GRAPHICS WORKSTATION AND METHOD OF OPERATING THEREFOR 
HOCHLE I S TUNGSFAH I GE S GRAPH IS CHE S ENDGERAT SOWIE BETRIEBSVERFAHREN DAFUR 
POSTE DE TRAVAIL GRAPH I QUE A HAUTE PERFORMANCE ET METHODE D 1 EXPLOITATION 

POUR CELA 
PATENT ASSIGNEE: 

DIGITAL EQUIPMENT CORPORATION, (313081), 111 Powdermill Road, Maynard 
Massachusetts 01754-1418, (US), (applicant designated states: 
DE ; FR ; GB ; I T ; NL ) 
INVENTOR: 

DOYLE, Peter, Lawrence, 279 Davis Street, Northboro, MA 01532, (US) 
ELLENBERGER, John, Philipp, 296 Nashua Road, Groton, MA 01450, (US) 
JONES, Ellis, Olivier, 124 Rattlesnake Hill Road, Andover, MA 01810, (US) 
CARVER, David, C, 6 Independence Avenue, Lexington, MA 02173, (US) 
DIPIRRO, Steven, D. , 270 High Street, Holliston, MA 01746, (US) 
GEROVAC, Branko, J., 116 Boston Post Road, Marlboro, MA 01752, (US) 
ARMSTRONG, William, Paul, 7080 South 2870 East, Salt Lake City, UT 84121, 
(US) 

GIBSON, Ellen, Sarah, 839 East South Temple, Salt Lake City, UT 84102, 
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(US) 

SHAPIRO, Raymond, Elliott, 29 Hunter Avenue, Marlboro, MA 01752, (US) 
RUSHFORTH, Kevin, C, 450 N Mathilda Ave S207, Sunnyvale, CA 94086, (US) 
ROACH, William, C, 580 Arapeen Drive, Salt Lake City, UT 84108, (US) 

LEGAL REPRESENTATIVE: 

Betten, Jurgen, Dipl.-Ing. et al (38515), Betten & Resch Patentanwalte 
Reichenbachstrasse 19, D-80469 Munchen, (DE) 

PATENT (CC, No, Kind, Date) : EP 329771 Al 890830 (Basic) 

EP 329771 Bl 960424 
WO 8901664 890223 

APPLICATION (CC, No, Date) : EP 88908489 880812; WO 88US2727 880812 

PRIORITY (CC, No, Date) : US 85081 870813 

DESIGNATED STATES: DE; FR; GB; IT; NL 

INTERNATIONAL PATENT CLASS: G06T-017/00; 

CITED PATENTS (WO A) : US 4315310 A; US 4509115 A 

CITED REFERENCES (EP A) : 

See also references of WO8901664; 



ABSTRACT EP 329771 Al 

A high performance graphics workstation includes a digital computer 
host and a graphics subsystem. Two- and three-dimensional graphics data 
structures, built by the host, are stored in the graphics subsystem. The 
asynchronous traversal of the data structures together with traversal 
control functions coordinate and control the flow of graphics data and 
commands to a graphics pipeline for processing and display. The address 
space of the graphics subsystem is mapped into a reversed 1/0 space of 
the host. This permits the host to directly access the graphics 
subsystem. 

ABSTRACT WORD COUNT: 91 

NOTE: 

No A-document published by EPO 



LEGAL STATUS (Type, Pub Date, Kind, Text) 
Application: 



890830 Al Published application (Alwith Search Report 

;A2without Search Report) 
890830 Al Date of filing of request for examination: 
890502 

891025 Al Inventor (change) 
900307 Al Inventor (change) 

920311 Al Date of despatch of first examination report: 
920123 

960424 Bl Granted patent 
970416 Bl No opposition filed 
LANGUAGE ( Publication, Procedural , Application ) : English; English; English 
FULLTEXT AVAILABILITY: 



Examination : 

Change : 
Change : 
Examination : 

Grant : 
Oppn None : 



Available Text 
CLAIMS A 
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ORDER fax of complete patent from Dialog SourceOne. See HELP ORDER 34 8 
Computer graphic apparatus for processing lighting model information. 
Rechnergraphikgerat zur Verarbeitung von Beleuchtungsmodellinf ormation . 
Appareil graphique a calculateur pour le traitement d 1 information de modele 

d'eclairage. 
PATENT ASSIGNEE: 

International Business Machines Corporation, (200120) , Old Orchard Road, 
Armonk, N.Y. 10504, (US), (applicant designated states: DE; FR; GB; IT) 
INVENTOR: 

Gonzalez-Lopez, Jorge, 8 Hewlett Road, Red Hook New York 12571, (US) 
Hempel, Bruce Carlton, Lasher Road, Tivoli New York 12583, (US) 
Liang, Bob Chao-Chu, Ryan Drive, West Hurley New York 12491, (US) 
LEGAL REPRESENTATIVE: 

Burt, Roger James, Dr. et al (52152), IBM United Kingdom Limited 

Intellectual Property Department Hursley Park, Winchester Hampshire 

S021 2JN, (GB) 

PATENT (CC, No, Kind, Date): EP 314341 A2 890503 (Basic) 

EP 314341 A3 910724 

EP 314341 Bl 950315 
APPLICATION (CC, No, Date) : EP 88309573 881013; 
PRIORITY (CC, No, Date): US 115467 871030 
DESIGNATED STATES: DE; FR; GB; IT 
INTERNATIONAL PATENT CLASS: G06T-011/00; 
CITED PATENTS (EP A) : US 4343037 A; EP 193151 A 

ABSTRACT EP 314341 A2 

A lighting model processing system for a computer graphics 
workstation's shading function includes multiple floating point 
processing stages arranged and operated in pipeline. Each stage is 
constructed from one or more identical floating point processors. The 
lighting model processing system supports one or more light sources 
illuminating an object to be displayed, with parallel or perspective 
projection. Dynamic partitioning can be used to balance the computational 
workload among various of the processors in order to avoid a bottleneck 
in the pipeline. The high throughput of the pipeline system makes 
possible the rapid calculation and display of high quality shaded images. 
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Single node image for a multiple processor network node. 

Vorstellung des Bildes eines einzigen Knotens fur einen Netzwerkknoten mit 

mehreren Prozessoren. 
Presentation de 1 1 image d'un noeud unique pour un noeud de reseau avec 

plusieurs processeurs . 

PATENT ASSIGNEE: 

International Business Machines Corporation, (200120) , Old Orchard Road, 
Armonk, N.Y. 10504, (US), (applicant designated states: DE; FR; GB) 
INVENTOR: 

Halim, Nagui, 1845 Maple Hill Street, Yorktown Heights, N.Y. 10598, (US) 
Nikolaou, Christos Nicholas, 121 West 79th Street, Apt. 1R, New York, 
N.Y. 10024, (US) 

Pershing Jr., John Arthur, 29-C Scenic Drive, Croton-on-Hudson, N.Y. 
10521, (US) 
LEGAL REPRESENTATIVE: 

Jost, Ottokarl, Dipl.-Ing. (6092), IBM Deutschland Inf ormationssysteme 
GmbH, Patentwesen und Urheberrecht , D-70548 Stuttgart, (DE) 
PATENT (CC, No, Kind, Date) : EP 314909 A2 890510 (Basic) 

EP 314909 A3 911030 
EP 314909 Bl 950308 
APPLICATION (CC, No, Date) : EP 88115361 880920; 
PRIORITY (CC, No, Date) : US 116424 871103 
DESIGNATED STATES: DE; FR; GB 
INTERNATIONAL PATENT CLASS: G06F-015/16; 
CITED PATENTS (EP A) : EP 118037 A 
CITED REFERENCES (EP A) : 

DATA COMMUNICATIONS, vol. 16, no. 2, February 1987, NEW YORK US pages 116 

- 134; T.J.Routt: "A network architecture gets on track" 
Proceedings IEEE/AIAA 7th didital avionics systems conference 13 October 
1986, Fort Worth, Texas, US pages 536 - 544; D.B.Evans: "Fault tolerant 
high-speed switched data network" 
IBM TECHNICAL DISCLOSURE BULLETIN, vol. 28, no. 8, January 1986, NEW YORK 
US pages 3513 - 3517; "Establishing virtual circuits in large computer 
networks" ; 

ABSTRACT EP 314 909 A2 

A method and apparatus for coupled computer systems provides a single 
network node image when connected to a computer network, so that the 
network is unaware of the "fine" structure of the computer systems in the 
machine room. The coupled complex is made available to the network. 

ABSTRACT WORD COUNT: 51 
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Application: 890510 A2 Published application (Alwith Search Report 

;A2without Search Report) 
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(c) 2000 European Patent Office. All rts. reserv. 

00298509 

ORDER fax of complete patent from Dialog SourceOne. See HELP ORDER 348 

An aperiodic mapping method to enhance power- of -two stride access to 

interleaved devices . 
Nichtperiodisches Abbildungsverfahren zum verbesserten Zweierpotenzzugrif f 

fur ineinandergreifende Einrichtungen. 
Methode de transformation aperiodique pour ameliorer l f acces, par pas de 

puissance de deux, a des dispositifs entrelaces . 
PATENT ASSIGNEE: 

International Business Machines Corporation, (200120), Old Orchard Road, 
Armonk, N.Y. 10504, (US), (applicant designated states: DE; FR; GB) 
INVENTOR: 

McAuliffe, Kevin Patrik, 3517 Strang Boulevard, Yorktown Heights N.Y. 
10598, (US) 

Melton, Evelyn Au, 20 Rothenburg Road, Poughkeepsie New York 12603, (US) 
Norton, Vern Alan, 11 Ridge Road, Croton-on-Hudson New York 10520, (US) 
Pfister, Gregory Francis, 780 Pleasantville Road, Briarcliff Manor New 
York 10510, (US) 

Wakefield, Scott Philip, 44 Hunter Place, Croton-on-Hudson New York 12520 

, (US) 
LEGAL REPRESENTATIVE: 

Schafer, Wolfgang, Dipl.-Ing. (62021), IBM Deutschland 

Inf ormationssysteme GmbH Patentwesen und Urheberrecht , D-70548 

Stuttgart, (DE) 

PATENT (CC, No, Kind, Date) : EP 313788 A2 890503 (Basic) 

EP 313788 A3 900801 
EP 313788 Bl 950621 
APPLICATION (CC, No, Date): EP 88115088 880915; 
PRIORITY (CC, No, Date): US 114909 871029 
DESIGNATED STATES: DE; FR; GB 
INTERNATIONAL PATENT CLASS: G06F-012/02; 
CITED PATENTS (EP A) : EP 179401 A; US 4400768 A 
CITED REFERENCES (EP A) : 

TRANSACTIONS OF THE I.E.C.E. OF JAPAN, vol. E65, no. 8, August 1982, 
pages 464-471; S. SHIMIZU et al.: "A new addressing scheme with 
reorganizable memory structure -basic principle-" 
THE 13TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, Tokyo, 
2nd - 5th June 1986, pages 324-328, IEEE, New York, US; D.T. HARPER et 
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al.: "Performance evaluation of vector accesses in parallel memories 
using a skewed storage scheme" 
IBM TECHNICAL DISCLOSURE BULLETIN, vol. 25, no. 8, January 1983, pages 
4445-4449, New York, US; R.N. LANGMAID: "Versatile programmable logic 
array" ; 

ABSTRACT EP 313788 A2 

An aperiodic mapping procedure for the mapping of logical to physical 
addresses is defined as a permutation function for generating optimized 
stride accesses in an interleaved multiple device system such as a large, 
parallel processing shared memory system wherein the function comprises a 
bit-matrix multiplication of a presented first (logical) address with a 
predetermined matrix to produce a second (physical) address. The 
permutation function maps the address from a first to a second address 
space for improved memory performance in such an interleaved memory 
system. Assuming that the memory has n logical address bits and 2( sup(d) 
separately accessible memory devices (where d <= n) and a second address 
that utilizes n - d bits of the first address as the offset within the 
referenced device node. The procedure includes performing a bit matrix 
multiplication between successive rows of the said matrix and bits of the 
first address to produce successive d bits of the second address. 

ABSTRACT WORD COUNT: 161 



LEGAL STATUS (Type, Pub Date, Kind, Text) 



890503 A2 Published application (Alwith Search Report 

;A2without Search Report) 
891004 A2 Date of filing of request for examination: 
890809 

900801 A3 Separate publication of the European or 

International search report 
921223 A2 Date of despatch of first examination report: 
921109 

950621 Bl Granted patent 

960501 Bl Date of lapse of the European patent in a 

Contracting State: FR 951117 
960612 Bl No opposition filed 
LANGUAGE ( Publication, Procedural , Application) : English; English; English 
FULLTEXT AVAILABILITY: 
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Examination : 

Grant : 
Lapse : 
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(c) 2000 European Patent Office. All rts. reserv. 

00284770 

ORDER fax of complete patent from Dialog SourceOne. See HELP ORDER 34 8 
Raster display vector generator. 
Vektorgenerator fur Raster-Bildschirmanzeige . 

Generateur de trace de vecteur pour l'affichage video a balayage par trame. 

PATENT ASSIGNEE: 

International Business Machines Corporation, (200120), Old Orchard Road, 
Armonk, N.Y. 10504, (US), (applicant designated states: DE; FR; GB; IT) 
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INVENTOR: 

Lumelsky, Leon, 30 Gaxton Road, Stamford Connecticut 06905, (US) 
LEGAL REPRESENTATIVE: 

Burt, Roger James, Dr. (52152), IBM United Kingdom Limited Intellectual 
Property Department Hursley Park, Winchester Hampshire S021 2JN, (GB) 
PATENT (CC, No, Kind, Date) : EP 279227 A2 880824 (Basic) 

EP 279227 A3 910417 
EP 279227 Bl 940518 
APPLICATION (CC, No, Date) : EP 88101080 880126; 
PRIORITY (CC, No, Date) : US 13848 870212 
DESIGNATED STATES: DE; FR; GB; IT 

INTERNATIONAL PATENT CLASS: G09G-001/16; G09G-005/36; 

CITED PATENTS (EP A) : US 4642625 A; EP 164880 A; US 4580236 A; WO 8500679 A 
; US 3906480 A 

ABSTRACT EP 279227 A2 

A vector generator for use with an all-points-addressable frame buffer 
capable of the non-word aligned access, simultaneously, of a square M by 
N array of pixels providing fast vector drawing independently of vector 
slope and position in the whole screen area of an attached display 
monitor. The vector generator utilises a triangular logic matrix together 
with a line drawing unit to generate M vector bits lying in an M by N 
square matrix of the screen of an attached monitor in one memory cycle of 
the frame buffer and uses the generated matrix to generate a direct mask 
for the frame buffer whereby the M bit vector may be stored in a single 
memory cycle . 

ABSTRACT WORD COUNT: 119 



LEGAL STATUS (Type, Pub Date, Kind, Text) : 



Application : 


880824 


A2 


Published application (Alwith Search Report 
;A2without Search Report) 


Examination: 
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A2 


Date of filing of request for examination: 
881130 


Change : 


910410 
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Obligatory supplementary classification 
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Search Report: 


910417 
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Separate publication of the European or 
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Change : 
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Examination: 
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Date of despatch of first examination report 
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Grant : 


940518 
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Granted patent 
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No opposition filed 
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(c) 2000 European Patent Office. All rts. reserv. 
00282118 

ORDER fax of complete patent from Dialog SourceOne. See HELP ORDER 34 8 
DATAFLOW PROCESSING ELEMENT , MULTIPROCESSOR, AND PROCESSES. 
DATENFLUS SVERARBE I TUNGSELEMENT -MULT I PROZE S SOR UND -VERFAHREN. 
ELEMENT DE TRAITEMENT DE FLUX DE DONNEES, MULTIPROCESSEUR ET PROCEDES. 

PATENT ASSIGNEE: 

DENNIS, Jack B., (948130), 55 Wellesley Road, Belmont, MA 02178, (US), 
(applicant designated states: BE; CH; DE; FR; GB; IT; LI ; NL; SE) 
INVENTOR: 

DENNIS, Jack B., 55 Wellesley Road, Belmont, MA 02178, (US) 
LEGAL REPRESENTATIVE: 

Driver, Virginia Rozanne et al (58901), Haseltine Lake & Co. Hazlitt 
House 28 Southampton Buildings Chancery Lane, London WC2A 1AT, (GB) 
PATENT (CC, No, Kind, Date) : EP 315647 Al 890517 (Basic) 

EP 315647 Al 910130 
WO 8800732 880128 
APPLICATION (CC, No, Date) : EP 87905809 870713; WO 87US1668 870713 
PRIORITY (CC, No, Date): US 885836 860715 
DESIGNATED STATES: BE; CH; DE; FR; GB; IT; LI; NL; SE 
INTERNATIONAL PATENT CLASS: G06F-003/00; G06F-009/30; G06F-009/36; 

G06F-009/38; G06F-009/40; G06F-013/00; 
CITED PATENTS (WO A): US 4153932 A; US 4197589 A; US 4644461 A; US 4591979 

A; US 4413318 A 
CITED REFERENCES (EP A) : 

PROCEEDINGS 3RD CONFERENCE ON DIGITAL AVIONICS SYSTEMS, Fortworth, Texas, 
November 1979, pages 19-25, IEEE, New York, US; M. CORNISH et al.: "The 
TI data flow architectures: The power of concurrency for avionics" 
THE COMPUTER JOURNAL, vol. 25, no. 2, May 1982, pages 207-217; P.C. 

TRELEAVEN et al . : "Combining data flow and control flow computing" 
See also references of WO8800732; 
CITED REFERENCES (WO A) : 

JENKINS, RICHARD A., "Supercomputers of Today and Tommorrow", Tab Books 

Inc., BLue Ridge Summit, PA. , 1986, pp. 92-94. 
REISIG, WOLFGANG, "Petri Nets", New York, NY, 1982, Chapters 1 and 3. 
HWANG, KAI and BRIGGS, FAYE A., "Computer Architecture and Parallel 
Processing", Mcgraw Hill, Inc., NY, 1984, Sections 10.1 and 10.2.; 
NOTE : 

No A-document published by EPO 
LEGAL STATUS (Type, Pub Date, Kind, Text) : 
Application: 890517 Al Published application (Alwith Search Report 

;A2 without Search Report) 
Examination: 890517 Al Date of filing of request for examination: 

890113 

Change: 890830 Al Representative (change) 

Search Report: 910130 Al Drawing up of a supplementary European search 

report: 901211 

Withdrawal: 930804 Al Date on which the European patent application 

was deemed to be withdrawn: 930202 
LANGUAGE ( Publication, Procedural , Application) : English; English; English 



27/5/22 (Item 22 from file: 348) 

DIALOG (R) File 348: European Patents 

(c) 2000 European Patent Office. All rts. reserv. 

00238836 

ORDER fax of complete patent from Dialog SourceOne. See HELP ORDER 34 8 
Computer . 
Rechner . 
Ordinateur . 
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PATENT ASSIGNEE : 

Thomas, Gerhard G., (843610), Weinmeisterhornweg 80, D-1000 Berlin 20, 

(DE), (applicant designated states: CH; DE; FR; GB; IT; LI; SE) 
Mitterauer, Bernhard Dr., (843620), Viehhausen 59, A-5071 Wals bei 
Salzburg, (AT), (applicant designated states: CH; DE; FR; GB; IT; LI ; SE) 
INVENTOR: 

Thomas, Gerhard G., Weinmeisterhornweg 80, D-1000 Berlin 20, (DE) 
Mitterauer, Bernhard Dr., Viehhausen 59, A-5071 Wals bei Salzburg, (AT) 

LEGAL REPRESENTATIVE: 

Haft, Berngruber, Czybulka , Postfach 14 02 4 6, D-8000 Munchen 5, (DE) 

PATENT (CC, No, Kind, Date) : EP 235764 A2 870909 (Basic) 

EP 235764 A3 880907 

APPLICATION (CC, No, Date): EP 87102829 870227; 

PRIORITY (CC, No, Date): DE 3607241 860305 

DESIGNATED STATES: CH; DE; FR; GB; IT; LI; SE 

INTERNATIONAL PATENT CLASS: G06F-015/06 

CITED PATENTS (EP A) : DE 3429078 A; US 4518866 A; EP 132926 A; US 3473160 A 
CITED REFERENCES (EP A) : 

SUPPLEMENTO AI RENDICONTI DEL CIRCOLO MATEMATICO DI PALERMO, Serie II/2, 

1982, Seiten 275-286; G.G. THOMAS: "On permutographs " 
PROCEEDINGS OF THE ASSOCIATION FOR COMPUTING MACHINERY, San Francisco, 
CA, 8.-14. Oktober 1984, Seiten 212-221, North-Holland/ACM, Amsterdam, 
NL; D.I. MOLDOVAN: "An associative array architecture intended for 
semantic network processing"; 



ABSTRACT EP 2357 64 A2 
Rechner . 

Die Erfindung bezieht sich auf einen Rechner, insbesondere zur 
Simulation biologischer Prozesse. Kernstuck des Rechners ist ein 
zentrales Logik/Rechensystem (2), das als n-wertiger Permutograph 
aufgebaut ist. Das Negationsnet z dieses Permutographen besteht aus 
einzelnen Knotenrechnern, die uber Informations- bzw. Negationsleitungen 
(22, 32) mit anderen Knotenrechnern verbunden sind. In jedem 
Knotenrechner (21) ist das Negationsnet z des Permutographen in einer 
Subknoteneinheit (26) enthalten. Der Gesamtrechner kann von aussen oder 
intern gesteuert werden, so dass sich ein intentionaler Rechner ergibt . 
ABSTRACT WORD COUNT: 7 9 

LEGAL STATUS (Type, Pub Date, Kind, Text) : 
Application: 870909 A2 Published application (Alwith Search Report 

;A2without Search Report) 
880817 A2 International patent classification (change) 
880907 A3 Separate publication of the European or 

International search report 
890315 A2 Date of filing of request for examination: 
890116 

901031 A2 Date of despatch of first examination report: 
900918 

930714 A2 Date on which the European patent application 
was deemed to be withdrawn: 930120 
LANGUAGE ( Publication, Procedural , Application) : German; German; German 



Change : 

Search Report : 
Examination : 
Examination: 
Withdrawal : 



27/5/23 (Item 23 from file: 348) 

DIALOG (R) File 348: European Patents 

(c) 2000 European Patent Office. All rts. reserv. 

00217101 

ORDER fax of complete patent from Dialog SourceOne. See HELP ORDER 34 8 

Switching system for transmission of data. 

Vermittlungs system fur Datenubertragung . 

Systeme de commutation pour la transmission de donnees . 

PATENT ASSIGNEE: 
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International Business Machines Corporation, (200120), Old Orchard Road, 
Armonk, N.Y. 10504, (US), (applicant designated states: DE; FR; GB; IT) 
INVENTOR: 

Franaszek, Peter Anthony, P.O. Box 218, Yorktown Heights, New York 10598, 
(US) 

LEGAL REPRESENTATIVE: 

Atchley, Martin John Waldegrave (27831), IBM United Kingdom Limited 
Intellectual Property Department Hursley Park, Winchester Hampshire 
S021 2JN, (GB) 

PATENT (CC, No, Kind, Date) : EP 195589 A2 860924 (Basic) 

EP 195589 A3 890719 

EP 195589 Bl 920610 
APPLICATION (CC, No, Date) : EP 86301778 860312; 
PRIORITY (CC, No, Date) : US 713117 850318 
DESIGNATED STATES: DE; FR; GB; IT 

INTERNATIONAL PATENT CLASS: G06F-015/16; H04Q-003/68; 

CITED REFERENCES (EP A) : 

AFIPS CONFERENCE PROCEEDINGS, Chicago, Illinois, 4th-7th May 1981, pages 
125-135, AFIPS Press, Arlington, Virginia, US; B. QUATEMBER: "Modular 
crossbar switch for large-scale multiprocessor systems-structure and 
imp 1 emen t ation" 

MICROPROCESSORS AND MICROSYSTEMS, vol. 7, no. 2, March 1983, pages 75-79, 
Butterworth & Co., (Publishers) Ltd, Whitstable, Kent, GB; B. WILKINSON 
et al.: "Cross-bar switch multiple microprocessor system" 

COMPUTER, vol. 14, no. 12, December 1981, pages 43-53, IEEE, Long Beach, 
CA, US; D.M. DIAS et al . : "Packet switching interconnection networks 
for modular systems" 

IBM TECHNICAL DISCLOSURE BULLETIN, vol. 25, no. 7A, December 1982, pages 
3578-3582, New York, US; E.R. MARSH: "Data base control and processing 
system" ; 

ABSTRACT EP 195589 A2 

A switching system for transmission of data comprises a switching 
matrix (34) partitioned into a plurality of selectable data transmission 
paths, these paths providing connections between each of a plurality of 
first ports of the matrix and selected ones of a plurality of second 
ports of the matrix, first path control means (30, 40) for controlling 
each data transmission path for completing each selected connection, and 
system control means (32, 42) responsive to a message requesting a 
connection between a first port and a selected second port to establish 
the requested connection, 

The switching system is characterised in that the system control means 
provides for the establishment of the requested connection beginning at a 
determined time based upon prior established connections to the selected 
second port, and the path control means (40) establishes the requested 
connection at the determined time so as to provide for transmission of 
data from the first port to the selected second port. 

ABSTRACT WORD COUNT: 161 

LEGAL STATUS (Type, Pub Date, Kind, Text) : 



Application : 


860924 


A2 


Published application (Alwith Search Report 
;A2without Search Report) 


Examination : 


870325 


A2 


Date of filing of request for examination: 
870116 


Search Report: 


890719 


A3 


Separate publication of the European or 
International search report 


Examination : 


910710 


A2 


Date of despatch of first examination report 
910527 


Grant : 


920610 


Bl 


Granted patent 


Oppn None : 


930602 


Bl 


No opposition filed 


Lapse : 


970423 


Bl 


Date of lapse of the European patent in a 
Contracting State: DE 961203 
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Lapse: 991020 Bl Date of lapse of European Patent in a 

contracting state (Country, date) : IT 
19920610, 

LANGUAGE ( Publication, Procedural, Application) : English; English; English 
FULLTEXT AVAILABILITY: 



Available Text 
CLAIMS B 
CLAIMS B 
CLAIMS B 
SPEC B 
Total word count 
Total word count 
Total word count 
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DIALOG (R) File 348: European Patents 

(c) 2000 European Patent Office. All rts. reserv. 

00160956 

ORDER fax of complete patent from Dialog SourceOne . See HELP ORDER 34 8 
Computer systems for curve-solid classification and solid modeling. 
Rechnersy steme zur Kurvenkorperklassif izierung und Korpermodellierung . 
Syst ernes de calculateurs pour la classification de solides courbes et la 

modelisation de solides. 
PATENT ASSIGNEE: 

THE UNIVERSITY OF ROCHESTER, (290263), Office of Research and Project 
Administration, 30 Administration Building, Rochester, New York 14627, 
(US), (applicant designated states: AT; BE; CH; DE; FR; GB; IT; LI ; LU; NL; SE) 
INVENTOR: 

Kedem, Gershon, 275 Ashbourne Road, Rochester, N.Y. 14618, (US) 
Ellis, John L., 226 Jeffords Road, Rush, N.Y. 14543, (US) 
LEGAL REPRESENTATIVE: 

Wagner, Karl H. et al (12561), WAGNER & GEYER Patentanwalte 
Gewurzmuhlstrasse 5, D-80538 Munchen, (DE) 
PATENT (CC, No, Kind, Date) : EP 160848 A2 851113 (Basic) 

EP 160848 A3 881005 
EP 160848 Bl 931201 
APPLICATION (CC, No, Date): EP 85104163 850404; 
PRIORITY (CC, No, Date): US 608295 840508 

DESIGNATED STATES: AT; BE; CH; DE; FR; GB; IT; LI; LU; NL; SE 
INTERNATIONAL PATENT CLASS: G06F-015/72; 
CITED REFERENCES (EP A) : 

COMPUTER GRAPHICS AND IMAGE PROCESSING, vol. 18, no. 2, February 1982, 
pages 109-144, Academic Press Inc., New York, US; S.D. ROTH: "Ray 
casting for modeling solids" 
IBM TECHNICAL DISCLOSURE BULLETIN, vol. 23, no. 9, February 1981, pages 
3996-4005, New York, US; S. BOINODIRIS: "Computer graphics using 
multi-echelon processing structures"; 

ABSTRACT EP 160848 A2 

Computer systems for curve-solid classification and solid modeling. 

A computer system is introduced for curve-solid classification 
(raycasting) of objects in constructive solid geometry (CSG) modeling to 
produce image representations of two-and three-dimensional objects. The 
system carries out curve-solid classifications in parallel and at much 
higher speed than a general purpose computer. It uses primitive 
classification processors which compute all of the (curveline or ray) 
primitive (basic solid bodies: block, cylinder, etc.) intersections in 
parallel, combine processors which are connected into a binary tree that 
duplicates the binary tree defining the CSG solid and computes the set 
operations (union, intersection and difference), and a host computer. 
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ABSTRACT WORD COUNT : 107 
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Assignee : 
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20000202 Bl Date of lapse of European Patent in a 
contracting state (Country, date) : AT 
19931201, BE 19931201, CH 19931201, LI 
19931201, IT 19931201, LU 19940430, NL 
19931201, SE 19931201, 

Published application (Alwith Search Report 
;A2without Search Report) 
Separate publication of the European or 
International search report 
Date of filing of request for examination: 
890331 

Applicant (transfer of rights) (change) : THE 
UNIVERSITY OF ROCHESTER (290263) Office of 
Research and Project Administration, 30 
Administration Building Rochester, New York 
14627 (US) (applicant designated states: 
AT ; BE ; CH ; DE ; FR ; GB ; I T ; LI ; LU ; NL ; SE ) 
Date of despatch of first examination report: 
910424 



Contracting State: CH 931201, LI 931201 
Date of lapse of the European patent in a 
Contracting State: CH 931201, LI 931201 
Date of lapse of the European patent in a 
Contracting State: CH 931201, LI 931201, NL 
931201 

Date of lapse of the European patent in a 
Contracting State: CH 931201, LI 931201, NL 
931201, SE 931201 

Date of lapse of the European patent in a 
Contracting State: AT 931201, CH 931201, LI 
931201, NL 931201, SE 931201 



Contracting State: AT 931201, BE 931201, CH 
931201, LI 931201, NL 931201, SE 931201 
Date of lapse of the European patent in a 
Contracting State: AT 931201, BE 931201, CH 
931201, LI 931201, GB 960404, NL 931201, SE 
931201 
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AT 
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SE 



contracting state (Country, date) : 
19931201, BE 19931201, CH 19931201, 
19931201, IT 19931201, NL 19931201, 
19931201, 

LANGUAGE ( Publication, Procedural , Application) : English; English; English 
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DIALOG (R) File 348: European Patents 

(c) 2000 European Patent Office. All rts. reserv. 

00145956 

ORDER fax of complete patent from Dialog SourceOne. See HELP ORDER 34 8 

System and method for a data processing pipeline. 

System und Verfahren zur Datenverarbeitungspipeline . 

Systeme et methode pour un pipeline de traitement de donnees . 

PATENT ASSIGNEE: 

Robert Bosch Corporation, P.O. Box 31816 2300 South 2300 West, Salt Lake 
City Utah 84131, (US), (applicant designated states: DE; FR; GB; IT) 
INVENTOR: 

Andrews, David Heber, 6435 South Tresa Drive, West Jordan Utah 84084, 
(US) 
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ABSTRACT EP 14 6250 A2 

System and method for a data processing pipeline. 

A data processing system for processing encoded control points 
representing graphical illustrations, comprises a number of separate 
micro-programmed circuit cards, each of which are programmed to perform a 
specific processing operation. 

A command is first sent to a matrix maker card (201) defining a 
geometrical transformation to be performed on the graphical illustration. 
This card, thogehter with a matrix multiplier card (202), then calculates 
a transformation matrix representing the desired transformation. 

Electronic representations of control data points are then transmitted 
to the pipeline for processing and multiplied by the transformation 
matrix, computed previously, in a vector mulitplier circuit card (203) . 
Next, the control points are clipped to the planes of a viewing frustum 
by a number of clipper cards (205-209), one card for each clipping plane. 
The 3D control points are then mapped onto the 2D viewing window by a 
viewpoint card (210). 

The clipped control points are then exploded to generate a plurality of 
small line segments representing each of the curved edges of the 
illustration. Finally, the appropriate portions of the illustration are 
rendered as a line drawing, in accordance with the code attached to the 
various control points; and the processed data is then converted into a 
form which is appropriate for scan conversion. 
ABSTRACT WORD COUNT: 217 
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English Abstract 

A distributed communication system comprising a number of nodes, 
comprising a number of resources, which nodes are interconnected by an 
interconnection network. Distributed applications are executed through 
sending messages between resources in said nodes. The resources are 
categorized into a number of function types wherein resources grouped 
into one and the same function type are functionally equivalent at least 
to a given extent so that a number of function type instances are 
provided for each function type. Each node comprises information holding 
means (11) keeping information about which function type instances 
correspond to a given function type and distribution functions (12) 
associated with said information holding means (11) for selecting a 
receiving function type instance among the available instances. A 
resource sending a message only has to give the function type as address 
information and the distribution function (12) selects which function 
type instance will be the receiver. 

French Abstract 

L' invention concerne un systeme de communications reparti comprenant un 
certain nombre de noeuds . Ces noeuds contiennent un certain nombre de 
ressources et sont interconnectes par un reseau d ' interconnexion . Les 
applications reparties sont executees par 1 ' envoi de messages entres les 
ressources desdits noeuds. Les ressources sont classees par types de 
fonctions, les ressources regroupees dans un seul et meme type de 
fonction etant au moins dans une certaine mesure f onctionnellement 
equivalentes, de telle facon qu ' un certain nombre d f instances de types de 
fonctions soient fournies pour chaque type de fonction. Chaque noeud 
comporte des moyens de fonds d ' informations (11) permettant de conserver 
des informations sur les differentes instances correspondant a un type de 
fonction donne et des fonctions reparties (12) associees a ces moyens de 
fonds d 1 informations (11) et permettant de selectionner parmi les 
instances disponibles une instance d f un type de fonction de reception. 
Une ressource envoyant un seul message doit donner le type de fonction 
comme information d'adresse et la fonction de repartition (12) 
selectionne 1 ' instance du type de fonction qui sera receptrice. 
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Detailed Description 
Claims 

Fulltext Word Count: 4898 
English Abstract 

In a model-based dynamically configured system (15, 25), various 
processing components (65, 75, 87) are created dynamically, interfaced to 
each other, and scheduled upon demand. A combination of data driven and 
demand-driven scheduling techniques (Fig. 5, 6) are used to enhance the 
effectiveness of the dynamically configured system. 

French Abstract 

Dans un systeme configure de maniere dynamique fonde sur un modele (15, 
25), plusieurs elements de traitement (65, 75, 87) sont crees 
dynamiquement , relies les uns aux autres, et organises en fonction de la 
demande. Un melange de techniques d 1 organisation (Fig. 5, 6) articulees 
autour de la base de donnees et de la demande sont utilisees pour 
accroitre l f efficacite du systeme configure de maniere dynamique. 
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Claims 
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English Abstract 

A single layer parallel distributed processing network (10) is 
characterized by having connection weights between nodes that are defined 
by an [N x N] information storage matrix (A) that satisfies the matrix 
equation: [A] [T] = [T] [LAMBDA] , where [LAMBDA] is an [N x N] diagonal 
matrix the components of which are the eigenvalues of the matrix [A] and 
[T] is an [N x N] similarity transformation matrix whose columns are 
formed of some predetermined number M of target vectors (where M < = N) 
and whose remaining columns are formed of some predetermined number Q of 
slack vectors (where Q = N - M) , both of which together comprise the 
eigenvectors of [A] . 

French Abstract 

Un reseau monocouche de traitement reparti parallelement (10) est 
caracterise en ce qu'il comporte des priorites de connexion entre les 
noeuds qui sont definies par une (N x N) matrice de stockage 
d 1 informations (A) qui repond a 1' equation matricielle: (A) (T) - (T) 
(LAMBDA) , ou (LAMBDA) est une (N x N) matrice diagonale dont les 
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composantes sont les valeurs propres de la matrice (A) et (T) est une (N 
x N) matrice de transformation par similitude dont les colonnes sont 
formees d'un nombre predetermine M de vecteurs cibles (ou M < = N) et 
dont les colonnes rest antes sont formees d'un nombre predetermine Q de 
vecteurs de remplissage (ou Q = N - M) , les deux constituant ensemble les 
vecteurs propres de (A) . 
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Detailed Description 

Claims 

Fulltext Word Count: 25645 
English Abstract 

A high performance graphics workstation includes a digital computer host 
and a graphics subsystem. Two- and three-dimensional graphics data 
structures, built by the host, are stored in the graphics subsystem. The 
asynchronous traversal of the data structures together with traversal 
control functions coordinate and control the flow of graphics data and 
commands to a graphics pipeline for processing and display. The address 
space of the graphics subsystem is mapped into a reversed 1/0 space of 
the host. This permits the host to directly access the graphics 
subsystem. 

French Abstract 

Un poste de travail graphique a haute performance comprend un ordinateur 
central numerique et un sous-systeme graphique. Des structures de donnees 
graphiques bi-dimensionnelles et tri- dimensionnelles , construites par 
l 1 ordinateur central, sont stockees dans le sous-systeme graphique. Le 
parcours asynchrone des structures de donnees avec les fonctions de 
commande du parcours coordonnent et commandent le flux des donnees 
graphiques et des instructions vers un pipeline de donnees graphiques a 
des fins de traitement et d ! affichage. L'espace d'adresse du sous-systeme 



40 May 10, 2000 10:32 



Ginger Roberts - Search Report 



graphique est topographie dans un espace I/O inverse de l'ordinateur 
central. Ceci permet a l ! ordinateur central d' avoir un acces direct au 
sous-systeme graphique. 
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Detailed Description 

Claims 
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English Abstract 

A novel computer design that is capable of utilizing large numbers of 
very large scale integrated (VLSI) circuit chips as a basis for efficient 
high performance computation. This design is a static dataflow 
architecture of the type in which a plurality of data flow processing 
elements (110) communicate externally by means of input/output circuitry 
(128), and internally by means of packets sent through a routing network 
(124) via paths (122). The routing network (124) implements a 
transmission path from any processing element to any other processing 
element. This design effects processing element transactions on data 
according to a distribution of instructions that is at most partially 
ordered. These instructions correspond to the nodes of a directed graph 
in which any pair of nodes connected by an arc corresponds to a 
predecessor-successor pair of instructions. Generally each predecessor 
instruction has one or more successor instructions, and each successor 
instruction has one or more predecessor instructions. In accordance with 
the present invention, these instructions include associations of 
execution components and enable components identified by instruction 
indices. Un ordinateur ayant une conception novatrice peut utiliser de 
grandes quantites de plaquettes a circuits integres a tres grande echelle 
(VLSI) comme base efficace de calcul a tres haute performance. Cette 
conception est une architecture a flux statique de donnees du type ou une 
pluralite d' elements de traitement de flux de donnees (110) communiquent 
exterieurement par des circuits d ' entree/sortie (128) et interieurement 
par des paquets envoyes par un reseau d 1 acheminement (124) via des 
parcours (122). Le reseau d 1 acheminement (124) met en oeuvre des parcours 
de transmission de n ! importe quel element de traitement a n ! importe quel 
autre element de traitement. Cette configuration effectue des 
transactions de donnees entre elements de traitement selon des 
instructions distribuees de facon tout au plus partiellement ordonnee . 
Ces instructions correspondent aux points nodaux d ! un graphique de 
directions dans lequel toute paire de points nodaux relies par un arc 
correspond a une paire d 1 instructions predecesseur-successeur . En 
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general, chaque instruction predecesseur a une ou plusieurs instructions 
successeur et chaque instruction successeur a une ou plusieurs 
instructions predecesseur. Ces instructions font intervenir des 
associations de composants d ! execution et de composants de validation 
identifies par des indices d 1 instruction . 
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Computer graphics system 
Patent Assignee: HEWLETT-PACKARD CO (HEWP ) 
Inventor: KRECH A S; RENTSCHLER E; SCOTT N D 
Number of Countries: 001 Number of Patents: 001 
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Patent No Kind Date Applicat No Kind Date Main IPC Week 

US 5940086 A 19990817 US 97781671 A 19970110 G06F-015/16 199939 B 
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Patent Details: 

Patent Kind Lan Pg Filing Notes Application Patent 

US 5940086 A 18 



Abstract (Basic) : US 5940086 A 

NOVELTY - A distributor (118) dynamically allocates each of the 
chunks of the vertex data to one of geometry accelerators (120) to 
provide a corresponding chunk of the rendering data. The allocation is 
based on the relative capability of the accelerators to process the 
vertex data. The relative processing capability is identified based on 
status information provided by accelerators. 

DETAILED DESCRIPTION - Several geometry accelerators (120) are 
configured to process vertex data representing graphic primitive 
and to provide rendering data. The accelerators generate availability 
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status information indicating various levels of vertex data 
processing capability. The levels represent a series of successively 
greater capability to process vertex data. An INDEPENDENT CLAIM is 
also included for the method of processing vertex data in computer 
graphics system. 

USE - For displaying graphical representations of objects on two 
dimensional video display screen. 

ADVANTAGE - Efficient distribution of vertex data substantially 
reduces the amount of time for which the geometry accelerators remain 
idle, thereby increasing efficiency of accelerators, and overall 
parallel processing of vertex data. Selective utilization of 
geometry accelerators results in significant increase in throughput 
of graphic system. 

DESCRIPTION OF DRAWING (S) - The figure shows the block diagram of 
the computer graphics system. 

Distributor (118) 

Geometry accelerators (120) 

pp; 18 DwgNo 1/5 
Title Terms: COMPUTER; GRAPHIC ; SYSTEM 
Derwent Class: T01 
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Parallel processing for general purpose multiple instruction multiple 
data computer systems - None 
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Abstract (Basic) : US 5935216 A 

NOVELTY - The method involves writing messages to each neighboring 
processor, in the fore and aft order to overlap the startup, copy and 
transfer time of further messages with the transfer time of previous 
the message. 

DETAILED DESCRIPTION - The messages are read from each of the 
neighboring processors in reverse order to the orientation to overlap 
the startup and transfer time with the copy of the last message. 
Readings are performed until all selected processors are read. The 
computation time of specific processor nodes can be computed using a 
FORTRAN compiler and memory to memory operations. 

USE - For use in general purpose MIMD computer systems. 

ADVANTAGE - Ensures efficient and time saving communication between 
processors, input and output devices using parallel processing 
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methods. Avoids dynamic reconfiguration or load balancing by 
allocation of work assignments and combines data into single messages 
where possible this also reduces the number of synchronized to reduce 
synchronization costs to performance . 

DESCRIPTION OF DRAWING (S) - The figure shows the graphic 
representation of a parallel computing system. 

Dwg. 14/15 
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Abstract (Basic) : WO 9930456 A2 

NOVELTY - If a resource of the kind thread (T10) wants to send a 
message (MSG), it is given as a function type (F10) of the receiver 
and, in an information holding means (11), function types (T20, T30, T40 ) 
are instances of the function type F10. A distribution function (12) 
performs the actual choice of a receiving entity and chooses from the 
currently available function type instances. The selected message is 
then sent to T30 for instance 

DETAILED DESCRIPTION - Independent claims are included for a 
distributed communication system, for a node in such a system and for 
a method of providing communications among applications 

USE - Sending messages in distributed communication system 

ADVANTAGE - Enabling use of newly added nodes or resources and 
handling removal of nodes or resources 

DESCRIPTION OF DRAWING (S) - The drawing is a simplified 
illustration of sending of message according to the invention 

Thread (T10) 

Message (MSG) 



3 May 10, 2000 10:16 



Ginger Roberts - Search Report 



Function type (F10) 

Information holding means (11) 

Distribution function (12) 
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Abstract (Basic) : US 5937431 A 

NOVELTY - A dynamic memory DM cache is utilized in each node of a 
shared memory as a backing store for data blocks discarded from the 
processor cache. The address binding to the DM is delayed from the 
block incoming time until the block discarding time when the blocks are 
discarded from the processor cache. 

DETAILED DESCRIPTION - The processor cache stores an address tag, a 
state of the block identifier, a local-global identifier and a data 
block. The DM cache stores an address, the state of the block 
identifier and a data block. 

USE - For data processing apparatus having memory access 
architecture . 

ADVANTAGE - Delays address binding to eliminate inclusion property 
between processor cache and local memory and to allow faster data 
access by avoiding cache only memory architecture COMA reliance on 
local memory as larger higher-level cache for processor cache. Can 
create more usable local memory space and reduce memory overhead, 
thereby allowing improvement in performance of distributed shared 
memory DSM architecture. 

DESCRIPTION OF DRAWING (S) - The drawing shows the illustration of 
the memory access mechanism in a dynamic memory architecture DYMA 
system. 
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Computer graphics system for processing geometric image data - has 
several geometry accelerators for parallel processing of vertex 
data into chunks of rendering data, for concentration and rasterising 
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Abstract (Basic) : US 5821950 A 

The system includes a number of geometry accelerators (32a-c) for 
processing vertex data, representative of graphics primitives, and 
providing rendering data. A distributor (30) is provided, which is 
responsive to a vertex data stream, for distributing chunks of the 
vertex data to the geometry accelerators, to provide chunks of 
rendering data. The distributor generates an end of chunk bit at the 
end of a corresponding chunk of vertex data and distributes each end 
of chunk bit to a geometry accelerators with the corresponding chunk of 
vertex data. Each geometry accelerator transmits each end of chunk 
bit from its input to its output. 

The system also includes a concentrator (36) for receiving the 
chunks of rendering data and end of chunk bits from each geometry 
accelerator, and combining chunks of rendering data into a stream of 
rendering data, in response to an end of chunk bit. The stream of 
rendering data and stream of vertex data represent sequences of 
graphics primitives having the same order. The system has a rasterizer 
(4 6) , responsive to the rendering data stream, for generating pixel 
data representative of a graphics display. 

ADVANTAGE - Achieves enhanced performance through use of 
parallel processors . Order of primitives are not changed in the 
parallel processing hardware . 
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Hole inner wall monitoring apparatus for pipelines - performs image 
processing of electric signal output from each receiving element 
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Abstract (Basic): JP 2779713 B 

The apparatus has a number of receiving elements which are arranged 
in the form of an array inside a bore-hole (2) drilled under a ground 
(1). An ultrasonic wave is radiated from the transmission elements. 
This ultrasonic wave is reflected by internal surface of the hole and 
is received by the receiving element. The surface of the receiving 
element array is divided in the form of triangles . The area of the 
triangle arranged at outer side in hole axis direction is made more 
than the area of triangle arranged at side direction of hole axis of 
receiving element array . The image processing of an electric 
signal output from each receiving element is performed. 

ADVANTAGE - Shortens data processing time. Obtains image of small 
size and sufficient resolving degree. 
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Fading-resistant modulation method for wireless communication system - 
using time, space or frequency diversity for defined vector space using 
signal constellation points generated by orthogonal matrix transform 
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Abstract (Basic) : CA 2186688 A 

The method involves provision of an L-dimensional signalling 
constellation formed from Q points. Each point represents a vector in a 
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vector space which has L orthogonal axes. Any two of the constellation 
points are vectors which differ in a number of their components. Each 
of the L components is transmitted over either L different antennae, L 
different carrier frequencies or L different time slots. The signalling 
constellation is obtained by applying an orthogonal transformation to 
an L-dimensional hyper-cube , with the constellation points being its 
vertices . 

The transformation preserves Euclidean distances between the 
signalling constellation points and the signals corresponding to the 
signalling constellation components transmitted in each antenna, 
carrier frequency or time slot are differentially encoded. 

USE - E.g. digital cellular GSM. 

ADVANTAGE - Diversity provides high quality operation without 
reduction in spectral efficiency for Rayleigh fading. Mitigates power 
variations to reduce probability of errors in channel. Bandwidth 
efficient. Provides good performance when coding is ineffective due 
to slow fading. Has significant energy savings for given bit error rate 
when background white Gaussian noise is present. 
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Target program loop statement optimising - determining control omega 
value for iterative construct and converting def into data constraint 
using control omega value 
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Abstract (Basic) : EP 843257 A 

The method involves use of loop statement having characteristics of 
a single basic block loop. It requires detecting that the loop 
statement contains at least one body statement that results in a def of 
an unspillable resource . A control omega value is determined for the 
iterative construct. The def is converted into a data constraint using 
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the control omega value. The iterative construct is then scheduled. 
The method further entails allocating the unspillable resource 

dependent on the data constraint. The unspillable resource may be a 

predicate register. The loop statement results in a data dependency 
graph , the def is represented by a def node in the data dependency 
graph . During converting, it entails adding a self output arc to the 

def node , and assigning the control omega value to the self output 

arc. 

USE - For optimising order of computer operation codes resulting 
from compilation of program loop. 

ADVANTAGE - Allows optimising single basic block loop within target 
program, thus, permits two or more instructions to be issued in single 
clock cycle within computer structure. 
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Multimedia image processing system for large scale parallel array 
processor e.g. supercomputer - comprises multiple processor arrays by 

which different applications is performed in parallel manner 
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Abstract (Basic) : JP 8171537 A 

The system comprises multiple processing units which are coupled 
together in the form of a processor array . Several processor 
arrays are connected repeatedly to form a processor mesh. 

By the frequency division multiplexing technique, the general 
purpose applications such as multimedia applications are performed by a 
specific processor array and other applications are performed in 
parallel by the other processor array . 

ADVANTAGE - Enables easy modification of processor mesh. Increases 
number of node connections between processors. Enables maintenance of 
processing function easily. 
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N-dimensional parallel seismic data processing method - involves 
allocating processors as host node and worker nodes which are 
assigned in parallel to data divided into depth slices in memory 

Patent Assignee: INT BUSINESS MACHINES CORP (IBMC ) 
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Number of Countries: 001 Number of Patents: 001 
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Patent Details: 

Patent Kind Lan Pg Filing Notes Application Patent 

US 5734829 A 13 

Abstract (Basic) : US 5734829 A 

The data processing method involves allocating one of the 
processors of the parallel computer system as a host node for 

distributing the data. Multiple processors are allocated as worker 
nodes for processing the data in parallel. The data is divided into 
slices along one of the dimensions of the data representing depth. Each 
slice is distributed to memory segments of the distributed memory. 

Each worker node is assigned to memory segments of the 
distributed memory to which a slice has been assigned, such that 
contiguous slices are assigned to different worker nodes . Each worker 
node processes a slice in parallel with and independent from the 
slices being processed by other worker nodes . The worker nodes are 
assigned to balance the load of processing the data among the worker 
nodes . 

ADVANTAGE - Allows routine application of sophisticated 3D DMO 
processes. Can handle arbitrarily irregular surveys. 
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Synchronous serial data processing computer in high speed packet 
switching system, graphic patterning apparatus - performs sequential 
processing to each bit input data which is serially forwarded between set 
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of nodes through switching network 
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Abstract (Basic) : JP 10055350 A 

The computer has a switching network (1) which is connected with a 
serial input-output integer adder and multiplier (2,3). A serial input 
integer judgment node (4), switch node (5) and a register node 
(6) are also connected with the switching network. 

Serial forwarding of data is carried out between the nodes , which 
are connected by a single wiring, through the switching network. 
Sequential process is carried out to each bit of the input data. 

ADVANTAGE - Simplifies wiring between nodes . Decreases amount of 
hardware required. Attains various parallel processing easily. 
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Three dimensional concealed graphical surface elimination 
processing device for high speed image processing - has geometrical 
processing unit which performs disposal of three dimensional 
graphical data based on output of rear surface processing unit 

Patent Assignee: SHARP KK (SHAF ) 
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Abstract (Basic) : JP 10031755 A 

The device has a graphic data memory unit (1) which stores 
vertex co-ordinate of a polygon comprising three dimensional 
graphics . A modeling conversion setting unit (2) outputs a matrix 

data for performing coordinate transformation of three dimensional 
graphic . A modeling conversion memory unit (4) stores the output of 
the modelling conversion setting unit. Projection conversion setting 
unit (3) sets the gaze vector of the perspective projection or the 
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parallel projection. The output of projection conversion setting unit 
is stored in a projection conversion memory (5) . 

Based on the contents of both the memories, a rear surface 
processing unit (6) outputs a code corresponding to the product of the 
normal line vector of the polygon and the gaze vector . A geometrical 
processing unit (7) performs disposal of three dimensional 
graphical data based on the output of rear surface processing unit. 

ADVANTAGE - Aims at acceleration of image processing by reducing 
number of polygons. Improves image processing speed. 
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First order equation solution obtaining method for parallel processing 
system - involves dividing analysis area into several sub areas and 

assigning processor for every area related to shared node at 

demarcation of sub area 
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Abstract (Basic) : JP 10011421 A 

The method involves using numerical analysis, in which a mesh is 
used to calculate an area (1) for analysis. The area to be analysed is 
divided into sub areas. For every sub area, a processor is assigned in 
relation to a shared node at the demarcation of the sub area. Data 
communication takes place among the processors of the sub areas. 

The data required for convergence of matrix calculation is shared 
by all the processors. Each processor calculates the specific area 
using the coefficient matrix and the connection demarcation 
information on the divided sub areas. Thus the whole analysis area is 
calculated. 

ADVANTAGE - Performs large scale analysis within short time. Avoids 
need for producing matrix division list for every processor. Reduces 
memory capacity and time required for calculation. 
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MIMD modular array processor architecture - has network interfaces 
for linking arithmetic processors, node memories and control 
processors , permitting each node to communicate with memories of other 

nodes for load balancing, buffering data and operation as high-speed 
DMA controllers 
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Abstract (Basic) : US 5701482 A 

The modular array processor architecture (10) comprises 
interconnected parallel processing nodes (11) s that each comprise 
a control processor (12), an arithmetic processor (13) having an input 
port (22) for receiving data from an external source that is to be 
processed, a node memory (14) that also comprises a portion of a 
distributed global memory, and a network interface (15) coupled between 
the control processor (12), the arithmetic processor (13), and the 
node memory (14). 

Data and control buses (17, 18) are coupled between the arithmetic 
processors (13) and network interfaces (14) of each of the processing 
nodes (11) . Respective network interfaces (15) link each of the 
arithmetic processors (13), node memories (14) and control processors 
(12) together to provide for communication throughout the architecture 

(10) and permit each node to communicate with the node memories 

(14) of all other processing nodes (11). This linking, along with the 
use of a heuristic scheduling algorithm, provides for load balancing 
between the processing nodes (11). Data queues are segmented and 
distributed across the architecture (10) in a way that the source and 
destination nodes (11) process data locally in the memory (14), while 
overflow is kept in distributed bulk memories (14). The network 
interfaces (15) buffer data transferred over the data and control buses 

(17, 18) to a respective node (11). Also, the network interfaces (15) 
operate as high-speed DMA controllers to transfer data between the 
arithmetic processor (13) and node memory (14) of a processing node 

(11) independent of the operation of the control processor (12) in that 
node (11). The control bus (17) is used to keep track of available 

resources throughout the architecture (10) under control of a 
heuristic scheduling algorithm that reallocates tasks to available 
arithmetic processors (13) based on a set of heuristic rules to achieve 
the load balancing. The data bus (18) is used to transfer data 
between the node memories (14) so that reallocated tasks are 
performed by selected arithmetic and control processors (13, 12) using 
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data that is stored locally. 

USE - Can execute navy standard processing graph methodology. 

ADVANTAGE - High processing bandwidth. Processing and scheduling 
capability grows linearly with additional nodes . 
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Integral primary equation calculation method for parallel computer of 
distributed memory system - by performing elimination process for every 
row of triangular breakdown result of coefficient matrix and 
calculation result of equation right side which are both stored in 
processor memory 

Patent Assignee: HITACHI LTD (HITA ) 

Number of Countries: 001 Number of Patents: 001 

Patent Family: 

Patent No Kind Date Applicat No Kind Date Main IPC Week 

JP 9223123 A 19970826 JP 9630336 A 19960219 G06F-017/12 199744 B 

Priority Applications (No Type Date) : JP 9630336 A 19960219 
Patent Details: 

Patent Kind Lan Pg Filing Notes Application Patent 

JP 9223123 A 6 

Abstract (Basic) : JP 9223123 A 

The calculation method involves arranging the coefficient matrix 
and equation right side in a distributed memory. The equation right 
side undergoes calculated while the coefficient matrix undergoes a 
triangular breakdown process. 

Triangular breakdown and calculation results for every row are 
stored in the memory of a processor so that elimination process can be 
performed. The final triangular or calculation result is transposed 
to the other side of the equation. 

ADVANTAGE - Processor operation rate is improved by accelerating 
calculation process for equation right side. 

Dwg. 1/8 

Title Terms: INTEGRAL; PRIMARY; EQUATE; CALCULATE; METHOD; PARALLEL; 

COMPUTER; DISTRIBUTE; MEMORY; SYSTEM; PERFORMANCE ; ELIMINATE; PROCESS; 

ROW; TRIANGLE ; BREAKDOWN; RESULT; COEFFICIENT; MATRIX ; CALCULATE; 

RESULT; EQUATE; RIGHT; SIDE; STORAGE; PROCESSOR; MEMORY 
Derwent Class: T01 

International Patent Class (Main) : G06F-017/12 
International Patent Class (Additional) : G06F-015/16 
File Segment: EPI 



14/5/16 (Item 16 from file: 351) 

DIALOG (R) File 351: DERWENT WPI 



13 May 10, 2000 10:16 



Ginger Roberts - Search Report 



(c) 2000 Derwent Info Ltd. All rts. reserv. 

011286113 **Image available** 

WPI Acc No: 97-264018/199724 

XRPX Acc No: N97-218355 
Parallel computer system - has calculation node and input-output node 
provided with individual cache, which communicates with each other 
mutually 

Patent Assignee: HITACHI LTD (HITA ) 

Number of Countries: 001 Number of Patents: 001 

Patent Family: 

Patent No Kind Date Applicat No Kind Date Main IPC Week 

JP 9091261 A 19970404 JP 95266474 A 19950920 G06F-015/163 199724 B 

Priority Applications (No Type Date) : JP 95266474 A 19950920 
Patent Details: 

Patent Kind Lan Pg Filing Notes Application Patent 

JP 9091261 A 9 

Abstract (Basic) : JP 9091261 A 

The system has an array of calculation nodes which are connected 
in form of a matrix . Cache is provided at both calculation and 
input-output nodes . One or more cache nodes communicates with one 
of the input-output nodes through respective network. Similarly, one 
or more calculation nodes communicates with one of the cache nodes . 

Then, the calculation node reads the data from the secondary 
memory and a data demand message is transmitted to the cache node . If 
data is present in the cache, it is transmitted to the calculation 
mode. If data is not present, then it is read from secondary memory and 
is then transmitted to calculation nodes . 

ADVANTAGE - Reduces processing waiting time. Avoids concentration 
of processing. Improves performance of parallel computer. 
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Computer network topology management and visualising system - maintains 
complex relationship between computer network elements to provide common 
database for storing node, type and view data 

Patent Assignee: SUN MICROSYSTEMS INC (SUNM ) 

Inventor: HSU W; KULKAMI A S; KULKARNI A S 

Number of Countries: 008 Number of Patents: 003 

Patent Family: 

Patent No Kind Date Applicat No Kind Date Main IPC Week 
EP 773649 A2 19970514 EP 96307993 A 19961105 H04L-012/24 199724 B 
JP 9266476 A 19971007 JP 96302018 A 19961113 H04L-012/24 199750 
US 5848243 A 19981208 US 95558274 A 19951113 G06F-015/16 199905 

Priority Applications (No Type Date) : US 95558274 A 19951113 
Cited Patents: No-SR.Pub 



14 May 10, 2000 10:16 



Ginger Roberts - Search Report 



Patent Details: 

Patent Kind Lan Pg Filing Notes Application Patent 

EP 773649 A2 E 35 

Designated States (Regional) : DE FR GB IT NL SE 
JP 9266476 A 31 

Abstract (Basic): EP 773649 A 

The computer network has several network nodes and 
interconnections. A network management system includes a database of 
managed network resources . The database defines network nodes , 
associated node types and associated views of the nodes . The system 
is operable to modify the views based on user input changes in 
attributes of the nodes . Network management users are arranged to 
display views of said network using the network management database. 

Preferably, the attributes of the nodes include parent 
relationships. The system is arranged to form a new view node each 
time a new parent is added to an attribute of a node . The system is 
arranged to delete a view node each time a parent is deleted from 
attributes of a node . 

USE/ADVANTAGE - Allows maintenance and viewing of physical and 
logical network topology. Enables users to access data only through 
physical topology database, with both physical and logical topology. 
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SIMD mesh parallel computer architecture for connection to host computer 
- has master processor element for broadcasting instructions to array of 
synchronous -execution slave processor elements, each contg. input-output 
processor section for routing data, and core processor 

Patent Assignee: MASSACHUSETTS INST TECHNOLOGY (MASI ) 

Inventor: GILBERT I H 

Number of Countries: 001 Number of Patents: 001 
Patent Family: 

Patent No Kind Date Applicat No Kind Date Main IPC Week 

US 5590356 A 19961231 US 94294757 A 19940823 G06F-013/00 199707 B 



Priority Applications (No Type Date) : US 94294757 A 19940823 
Patent Details: 

Patent Kind Lan Pg Filing Notes Application Patent 

US 5590356 A 80 



Abstract (Basic) : US 5590356 A 

The Monolithic Synchronous Processor (Mesh-SP) processes data and 
incorporates a mesh parallel computer architecture, primarily SIMD, 
Each Mesh-SP processor node utilizes a single DSP processor element, 
a large internal memory of at least 128k-bytes, and separately operable 
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computational and 1-0 processing sections. 

The processor element provides data throughput of at least 120 
MFlops. The processor is programmed in ANSI C and without parallel 
extensions. A combination of on-chip DMA hardware and system software 
simplifies data 1-0 and inter-processor communication. A functional 
simulator enables Mesh-SP algorithms to be coded and tested on a 
personal computer. 

USE /ADVANTAGE - Combines high data throughput with modest size, 
weight, power and cost. Facilitates software development. Mesh-SP 
appears to programmer as single computer which executes single program, 
reducing programming complexity. Mesh-SP is programmed to solve wide 
variety of computationally-demanding signal processing problems, e.g. 
three -dimensional graphics or multi-dimensional signal processing, 
neural networks, tomographic reconstruction, large Fourier transforms 
and solving linear equations. 
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Multidimensional spectral load balancing apparatus for circuit design - 
includes procedure which uses series of eigenvectors of Laplacian matrix 
of graph of problem to partition problem 

Patent Assignee: SANDIA CORP (SAND-N) 

Inventor: HENDRICKSON B A; LELAND R W 

Number of Countries: 001 Number of Patents: 001 

Patent Family: 
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Application Patent 
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Abstract (Basic) : US 5587922 A 

The parallel computational apparatus includes a series of 
computational units which are connected in pairs via data links. The 
data links define the connection topology of the parallel computer 
system. A procedure subdivides a given problem among the computational 
units . 

The procedure involves constructing a graph which corresponds to 
the given problem. The graph includes a series of vertices , which 
represent a corresponding series of computational tasks of the given 
problem, and a series of weighted edges which represent information 
flow between the computational subtasks. A Laplacian matrix of the 
graph is generated and k eigenvectors of the matrix are computed. An 
orthogonal basis for a space spanned by the eigenvectors is selected. 
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The computational subtasks are partitioned into subsets using the 
eigenvectors. Each of the subsets are assigned to one of the 
computational units in a manner consistent with the connection 
topology. 

USE /ADVANTAGE - Optimises parallel computer processing of 
problem and minimises total pathway lengths of integrated circuits in 
design stage. 
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Coupled-vibration analysis for flow structure e.g. power, information, 
traffic - involves calculating compsn. matrix on whole motion equation 

by parallel processing and analyzing combined shearing stress and 

viscosity 

Patent Assignee: HITACHI LTD (HITA ) 

Number of Countries: 001 Number of Patents: 001 

Patent Family: 

Patent No Kind Date Applicat No Kind Date Main IPC Week 

JP 8123852 A 19960517 JP 94264849 A 19941028 G06F-017/50 199630 B 

Priority Applications (No Type Date) : JP 94264849 A 19941028 
Patent Details: 

Patent Kind Lan Pg Filing Notes Application Patent 

JP 8123852 A 11 

Abstract (Basic) : JP 8123852 A 

The method involves seeking the matrix M of the couplings C, K, 
LI, L2, L3, L4, L5, H, D, and Q of a fluid structure by parallel 
processing . A triangle decomposition is done for the matrix M, Q, 
K, and H. A matrix operation M-1K is done on the whole motion 
equation and Q-1H is sought by parallel processing . 

A process which seeks for the flow velocity is done and the flow 
velocity is altered. The stable critical flow velocity is computed by 
peculiar value calculation. The shearing stress and the influence of 
compression through viscosity are combined and analyzed. 

ADVANTAGE - Improves analysis accuracy and shortens calculation 
time . 
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Multi-processor operation method for PICO system - involves storing root 
nodes on expansion queue for allocation to network with nodes 

generating offsprings for routing and message broadcasts 
Patent Assignee: FMC CORP (FMCC ) 

Inventor: DIAMOND M D; KIMBEL J C; RENNOLET C L; ROSS S E 
Number of Countries: 001 Number of Patents: 001 
Patent Family: 
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US 5517654 A 19960514 US 92888936 A 19920526 G06F-015/18 199625 B 

Priority Applications (No Type Date): US 92888936 A 19920526 
Patent Details: 

Patent Kind Lan Pg Filing Notes Application Patent 

US 5517654 A 21 

Abstract (Basic) : US 5517654 A 

The method involves storing a root node on a list of nodes on 
an expansion queue in a node generation subsystem of the PICO system. 
The expanded nodes are allocated to the network of multiprocessors 
and the expanded nodes are hashed. Offsprings are generated forming 
shadow nodes to occupy idle capacity of the network of 

multiprocessors. A message subsystem is interacted with, initialized by 
the user, to refine the shadow nodes . This involves routing shadow 
node offsprings via a message channel subsystem in the PICO system. 

Nodes are removed to curtail offspring generation. Bounds, values 
and upper and lower limits of the offspring are broadcast to compare 
with the root node and determine convergence. The broadcast is routed 
to deploy messages to the network of processors via a message channel 
subsystem in the PICO system. The PICO system is connected to the root 
processor via a state vector. Memory and shutdown operation of the PICO 
system are managed by an auxiliary function device. 

USE /ADVANTAGE - For enumerative and graph search problems. 
Provides near 100% processor utilisation in multiprocessor network. 
Eliminates idle process capacity by storing shadow nodes in idle 
processors . 
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Parallel processing network - has pivot switches for performing 
communication between cluster of processors matrix sequence 
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Abstract (Basic): JP 8077128 A 

The network has several set pivot switches (34) connected to 
several node switch settings (30,32) respectively forming a cluster 
of processor. The cluster of processor matrix sequence and line are 
interconnected through a circuit. All pivot switches line are 
interconnected to the same line of the node switch settings. 

The communication between the cluster of processor are performed 
through the pivot switches. A circuit does not directly connects 
processing nodes (24,26). 

ADVANTAGE - Arranges large-scale connection between processor 
array node switch groups along with line and sequence by utilising 
switch setting added to processing node forming cluster of processor. 
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Node coupling system for LAN, WAN - groups node based on dimensional 
co-ordinates , with secondary nodes mounted on substrate 
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Abstract (Basic) : JP 7200508 A 

The node coupling system has an element processor provided with n 
dimension mesh (00,01,02...) to form a link. A cross bar switch or a 
processor is connected to secondary nodes (S0-S3) . The nodes are 
grouped based on the dimensional co-ordinates, to form n-1 meshes. The 
secondary nodes are mounted on a substrate. 

ADVANTAGE - Gives high mounting ease, number extendibility, 



19 May 10, 2000 10:16 



Ginger Roberts - Search Report 



communication-link band width and random communication performance . 
Increases matrix multiplication efficiency and overall performance . 
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Data flow microprocessor with vector operation function - has program 
unit assigning destination node number to operands using data flow 
graph with stored operands given to operation unit 
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Abstract (Basic) : US 5404553 A 

The processor has an I/O interface unit transferring data with an 
operand to and from an external unit. A circuit generates a pair of the 
operands by detecting data packets with coincident destination node 
numbers. An operation unit receives the operands and produces a result 
depending on an attached instruction code and transfers the result to 
the I/O interface unit. A program memory unit operates on the operands 
by reading its stored data flow graph . 

The destination node number attached to the operands is addressed 
as an input address. The destination node number and tag information 
is updated using the data flow graph . The I/O interface unit, operand 
generating circuit, program memory unit and data memory are connected 
in a ring shape. When the instruction code attached to the data input 
to the data memory is a predetermined instruction, previously stored 
operands are given with the code to the operation unit and operated on 
in sequential order. 

ADVANTAGE - Provides complete control with small number of simple 
instructions. Capable of executing program at high efficiency while 
achieving high vector operation performance . 
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Draw processor for high performance 3- D graphics accelerator - 
performs scan edgewalking and scan interpolation functions to render 3- 
D geometry object defined by draw packet 
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Abstract (Basic) : EP 631252 A 

The draw processor has a geometry pipeline interface circuit 
receiving a draw packet over a bus from a floating point processor. The 
draw packet contains a set of geometry parameters that define a 
geometry object including high level screen space descriptions of 2-D 
and 3 -D point line and area graphics primitives. The interface 
adjusts the geometry parameters according to an interleave value 
corresp. to the draw processor. A rendering circuit receives the 
parameters and generates a pixel set corresp. to the object by 
performing edgewalking and scan interpolation functions according to 
the geometry parameters. 

A direct port interface receives a direct port packet over the draw 
bus from a command pre-processor . The direct port packet contains a set 
of pixel function parameters that control at least one pixel function 
of the draw processor. A memory control circuit receives the pixels and 
the pixel function parameters and writes the pixels into a frame memory 
buffer whilst performing the pixel function. 

USE/ADVANTAGE - Fast operation esp. for tessellated geometry. High 
rendering performance quality. 
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Floating point processor for three- dimensional graphics accelerator 
- has specialised graphics micro instruments for hardware re-mapping 
general purpose registers to sort triangle vertices 
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Abstract (Basic) : EP 627682 A 

The floating-point processor for a high-performance three - 
dimensional graphics accelerator in a computer system implements 
specialised graphics micro instructions. The specialised graphics 
micro-instructions include a swap micro instruction which causes a 
hardware re-mapping of general purpose register groups to sort 
triangle vertices . The graphics micro instructions also include 
specialised conditional branches for three dimensional geometry. 

The circuitry includes a multiple buffer input register file, 
multiple buffer output register file, and control sequencer for 
assembling the draw packet using floating-point compare and swap micro 
instructions. The swap micro instruction rearranges a register map for 
the first, second and third register groups. 

ADVANTAGE - Improves graphics accelerator performance while 
minimising costs. 
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Booting mode in distributed digital data processing system - performing 
boot retrieval operation in response to receipt by host of initiate boot 
image transfer request 
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Abstract (Basic) : US 5367688 A 

A distributed digital data processing system includes a host and at 
least one node interconnected by a communications link. In response 
to a boot command, the node requests its boot image from the host 
over the communications link. The host then provides pointers to 
portions of the boot image to the node . The node then retrieves 
the portions of the boot image identified by the pointers. These 
operations are repeated until node retrieves the entire boot image . 
By having the host supply pointers to the boot image and the 
node perform the retrieval operations in response to the pointers, the 
host is freed to perform other operations while the node is actually 
performing the retrieval operations. 

USE - Enabling booting of intelligent node connected to host 
system. 
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Self- timed mesh routing chip with data broadcasting - passes message 
contg first and second data words from first processor node to first 
message routing device 
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Abstract (Basic) : US 5333279 A 

The appts providing for data broadcasting in a two dimensional mesh 
of processor nodes is disclosed. In accordance with the present 
invention, a self-timed message routing chip is coupled to each 
processor node , thereby forming a two dimensional mesh of message 
routing chips. Broadcasting originates from a corner node , and data 
can broadcast through the mesh routing chips to a row, a column, or a 
matrix of nodes . 

The mesh routing chips, together, form a self-timed pipeline with 
each individual message routing chip having broadcasting hardware which 
provides for the forking of a message within that particular message 
routing chip. The self-timed forking of a message within individual 
message routing chips directly supports data broadcasting within the 
two dimensional mesh. 

USE /ADVANTAGE - For routing and broadcasting data in 
two-dimensional mesh of processor nodes . Significant reduction of 
time required for broadcast task, hence improving performance of 
entire parallel processing system 
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Parallel scalable internet working unit architecture - employs two 
network controllers, foreground and background buffer controller, both 
with local memory, node processor and buffer memory attached to IWU 
with individual PMI communicating with FGAM 
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Abstract (Basic) : EP 604341 A 

The system has a memory (112) for storing packets, and a background 
buffer controller coupled to the packet memory for organising and 
maintaining the packets in memory. A foreground buffer controller 
(FGAM) coupled between the foreground unit and the packet memory, 
transfers packets to and from the background unit (BGAM) . 

Packets transferred from one of the networks to the packet memory, 
and from the packet memory to one of the networks pass through the 
foreground unit. A node processor (NP) is coupled to the foreground 
and background unit to access packets from the memory via the 
background unit. 

ADVANTAGE - Use of decentralisation of overall control of buffers 
by creating front end buffer controller allows for greater parallel 
processing of data transfer and control as well as greater 
scalability. 
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Distributed data processing system - has several resources , user 
processes performing transactions accessing resources, resource 
manager responsive to lock requests transaction manager storing wait-for 
graph, and cyclic chain od dependencies detector 
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Abstract (Basic) : EP 595453 A 

The distributed data processing system includes a distributed 
resource manager which detects dependencies between transaction caused 
by conflicting lock request. A distributed transaction manager stores a 
wait-for graph with nodes representing transactions and edges the 
nodes and represents dependencies between the transactions. 

Each edge is labelled with the identities of the lock requests that 
caused the dependency. The transaction manager propagates probes 
through the graph to detect cyclic dependencies, indicating deadlock. 
ADVANTAGE - Improved deadlock detection and resolution. 
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Fault™ tolerant mesh with spare nodes in parallel or network 
architecture for massively parallel computer or other element array - 
adds spare components ( nodes) and extra links (edges) to given target 
mesh of small degree so architecture can be reconfigured as operable 
target mesh in the presence of up to k faults, regardless of their 
di s tr ibu ti on 
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Abstract (Basic) : US 5271014 A 

The network architecture tolerates up to k faults in a 
d- dimensional mesh architecture based on the approach of adding spare 
components (nodes ) and extra links (edges) to a given target mesh 
where m spare nodes (mk) are added and the maximum number of links 
per node (degree of the mesh) is kept small. The resulting 
architecture can be reconfigured, without the use of switches, as an 
operable target mesh in the presence of up to k faults, regardless of 
their distribution. 

Given a d-dimensional mesh architecture having N=nlmultiplied 
byn2multiplied by. . . multiplied bynd nodes , the fault-tolerant mesh 
can be represented by a diagonal or circulant graph having N+m-k 
nodes , where mk. This graph has the property that given any set of k 
or fewer faulty nodes , the remaining graph , after the performance 
of a pre-determined node renaming process, is guaranteed to contain 
as a subgraph the graph corresponding to the target mesh M so long as 
d2 and nd3 . The fault-tolerant mesh allows a healthy target mesh to be 
located in the presence of up to k faulty network components. 

USE /ADVANTAGE - Low redundancy cost handling of faults in mesh 
architectures, giving higher yield eg in WSI array mfr. 
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Re- configurable signal processor performing concurrent computations - 
realises generic capability for fault- tolerant and re-configurable 
multi -processor computer scalable to thousands of processor elements 
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Abstract (Basic) : GB 2262175 A 

In the system controlled assembly of processing elements is 
interconnected as processing nodes . A desired topology of nodes is 
embedded into a fixed lattice, comprising a remote command Host, 
multiple processor elements arrayed in one or more matrices of nodes 
. Each element having multiple exterior ports accessing the processing 
capability of the element, and effecting signal routing within each 
processor element between its processing capability and any of the 
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multiple ports . 

For blocking signal routing at selected ports, selected ports of 
the elements in each matrix are connected to selected ports of 
neighbour elements, and for connecting selected ports of designated 
elements either to selected element ports in a further matrix of 
processor elements or to the Host. The Host conditions the element 
ports to direct signals to and from only selected ones each element's 
neighbouring processor elements, the conditioning means achieving a 
desired interconnection topology for the nodes of the system. 

USE /ADVANTAGE - Enables or disables nodes as necessary by 
revising communication paths. Adds steps to application program to 
convey idealised or nominal system configuration. 
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Multiprocessor scientific visualisation system - includes number of 
processor nodes, each including data processor which generates buffers 
byte enable signals and control signals 
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Abstract (Basic) : EP 524683 A 

The system comprises a number of processor nodes each including a 
data processor (22a, 28a) and a device, coupled to each of the nodes , 
for buffering data written by the associated data processor to a first 
bus (23c), prior to the data being transmitted to a second bus (32). A 
device, coupled to each of the nodes , buffers byte enable signals 
generated by the associated data processor in conjunction with the data 
written by the data processor. A device transmits the buffered data to 
the second bus, the transmitting device including a device responsive 
to the stored byte enable signals, for also transmitting a control 
signal to the second bus for indicating if a memory write operation is 
to be accomplished as a read-modif y-write type of operation. 

A device couples the data, the control signal, and the byte enable 
signals from the second bus to a third bus (24) for reception to a 
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memory shared by all of the data processors. 

USE /ADVANTAGE - High performance multiprocessor system. 
Efficiently utilises shared resources . 
m 
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Centralised and distributed wait depth limited concurrency control - 
taking into account progress made by translations in conflict resolution 
in restarting translations by taking account of wait depth tree compared 
with predetermined value 
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Abstract (Basic) : EP 501025 A 

A wait depth data structure is maintained which graphically 
describes a waiting depth of transactions being processed by the system 
where, for each transaction, a real-valued function provides a measure 
of current length lof a transaction. 

For each request for a lock, the wait depth data structure is 
tested for exceeding a predetermined value. The real-valued function is 
used to determine and restart the subset of transactions in the case of 
conflict between transactions so that the wait depth is reduced or kept 
below a predetermined value. 

USE/ADVANTAGE - Concurrency control in multi-user data 
processing environment. Minimises unnecessary lock conflicts. Restricts 
depth of waiting tree. Avoids throughput limitation and deadlock 
detection problems under conditions of high data contention. 
Dwg. 1/9 

Title Terms: CENTRE; DISTRIBUTE; WAIT; DEPTH; LIMIT; CONTROL; ACCOUNT; 

PROGRESS; MADE; TRANSLATION; CONFLICT; RESOLUTION; RESTART; TRANSLATION; 



29 May 10, 2000 10:16 



Ginger Roberts - Search Report 



ACCOUNT; WAIT; DEPTH; TREE; COMPARE; PREDETERMINED; VALUE 
Derwent Class: T01 

International Patent Class (Main) : G06F-015/40; G06F-015/403 
File Segment: EPI 



14/5/35 (Item 35 from file: 351) 

DIALOG (R) File 351: DERWENT WPI 

(c) 2000 Derwent Info Ltd. All rts. reserv. 



009074435 **Image available** 

WPI Acc No: 92-201854/199225 

XRPX Acc No: N92-152755 
Computer system graphically configuring data process 
ions to enable users to define multiple network node 
associated with them and connections between them 
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Abstract (Basic) : EP 490624 A 

The computer system graphically represents a network of three or 
more nodes by defining network objects for the nodes , and 
graphically defines connections to relate the network objects. The 
system also automatically generates parameters to configure a physical 
network as defined by the network objects and connections. 

Preferably, the user defines multiple network work station nodes 
using icons (13), specifies the resources associate with each icon 
(12), and defined connections between icons using specified protocol 
constraints. The computer validates the network so defined and 
generates the associated configuration files for the respective work 
station nodes . The configuration files for the respective work 
stations in the network are preferably distributed and installed using 
the network resources . The network topology information so created 
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can be stored, retrieved and modified as necessary. 
ADVANTAGE - Meets needs of evolving network 
Title Terms: COMPUTER; SYSTEM; GRAPHICAL ; DATA; PROCESS; NETWORK; ION; 
ENABLE; USER; DEFINE; MULTIPLE; NETWORK; NODE ; RESOURCE ; ASSOCIATE; 
CONNECT 
Derwent Class: T01 

International Patent Class (Main) : G06F-003/00; G06F-013/00; G06F-015/16 
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Computational model for dynamically configured systems - has various 
processing components created dynamically interfaced to each other and 
scheduled upon demand 

Patent Assignee: UNIV VANDERBILT (UYVA-N) 

Inventor: BIEGL C; KARSAI G; SZTIPANOVITS J 

Number of Countries: 017 Number of Patents: 003 
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WO 9208196 Al 19920514 WO 91US7397 A 19911007 G06F-013/14 199222 B 

AU 9188457 A 19920526 AU 9188457 A 19911007 G06F-013/14 199235 

WO 91US7397 A 19911007 
US 5249274 A 19930928 US 90602961 A 19901024 G06F-015/16 199340 
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Designated States (Regional) : AT BE CH DE DK ES FR GB GR IT LU NL SE 
AU 9188457 A Based on WO 9208196 
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Abstract (Basic) : WO 9208196 A 

A knowledge-based interpreter generates executable code to 
represent the engineering application of the scheduling and apparatu, 
for model-based dynamically configured systems. The interpreter 
configures the final system from elementary building blocks such as 
signal processing routines or controller modules. The system 
configuration is generated dynamically from the model. The model and 
the system can be modified during system operation to relect changes 
the environment. 

The central structure of the system is represented by models or 
graphs which are built up of actor nodes , data nodes and 
connection specifications. Each actor node is associated with a 
particular computational unit and with a local data structure. The 
actor nodes perform transformations on data streams by running an 
application module. Data nodes store either raw data, data produced 
by actor nodes or point to data, 

A cross correlator takes samples (10, 12) and performs fast 
Fourier transforms (14) and squares (16).. An inverse fast Fourier 
transform (18) is averaged (20,22) and the result displayed (24). 
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ADVANTAGE - Improved performance of data driven and demand 
driven. 

Dwg .1/6 
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Symmetric linear system solving device using super- computer - perform 
vector processing at high-speed with reduced memory requirement over 

conventional scalar method 
Patent Assignee: NEC CORP (NIDE ) 
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Patent Details: 

Patent Kind Lan Pg Filing Notes Application Patent 
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Abstract (Basic) : EP 461608 A 

The device solves a symmetric linear system, typically represented 
by the matrices Au=b, where it is necessary for input data to prepare 
only a right hand side vector of an equation, a diagonal matrix and 
either an upper or lower triangular matrix . 

The device is supplied with the two-dimensional arrays AA and JA, 
and the one-dimensional array B. Following pointer array construction 
and matrix decomposition, iteration in conjunction with first and 
second product vectors yields the solution vector. 

u . 

ADVANTAGE - Requires less memory in supercomputer than 
conventional calculation method, by utilising high-speed vector 
processing technique. (12pp Dwg. No. 2/6) 
Title Terms: SYMMETRICAL; LINEAR; SYSTEM; SOLVING; DEVICE; SUPER; COMPUTER; 
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Multiprocessor with crossbar between processors and memories - 
establishes processor memory links individual processors switch and 
memories on single silicon chip 
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Designated States (Regional) : DE FR GB IT NL 
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Based on 



US 5471592 



US 5471592 



EP 429733 



Abstract (Basic) : EP 429733 A 

The multi-processing system has n processors, each operable from 
instruction sets provided from a memory source for controlling a number 
of different processes, said processes relying on the movement of data 
to or from one or more addressable memories. Memory sources, each have 
a unique addressable space. M is greater than n. A switch matrix is 
connected to the memories and is connected to the processors. The 
switch matrix is enabled on a processor cycle by cycle basis for 
interconnecting any of the processors with any of the memories for the 
interchange between the memories and the connected processors of 
instruction sets from one or more addressable memory spaces and data 
from other addressable memory spaces. 

A common instruction set is capable of operating w.r.t. each other 
in a parallel processing capacity from the same or different 
instruction streams from the common instruction set and one other 
processor operable with a different instruction set. 

ADVANTAGE - High operational flexibility. (153pp Dwg.No.1/61 
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Processor array system with an SIMD architecture - has sub-array 

modules, each module having 32 processing elements, byte-wide arithmetic 

unit and multi-byte shift network 
Patent Assignee: AMT HOLDINGS LTD (AMTH-N) 
Inventor: HUNT D 

Number of Countries: 013 Number of Patents: 001 
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Abstract (Basic) : EP 428327 A 

The processor array system employing an SIMD architectures 
comprises a number of sub-arrays (S1...S4) modules. Each sub-array 
includes n=32 processing elements (PE) . Each processing element is 
connected to a local store comprising on-chip memory; each chip is 
connected by an m-bit wide path (where m is greater than 1) to a block 
region of off-chip memory. The m-bit wide path is selectively 
configurable as one bit path to or from each of m processor elements , 
or as an m-bit wide path arranged to communicate complete m-bit words 
of memory data between the region of off-chip memory and respective 
processing elements. 

Each processing element includes a byte-wide arithmetic unit (ALU) 
and byte-wide data paths for carrying data between the ALU and the on 
chip memory; each processing element further includes a four byte wide 
32 bit operand shift network (Q) comprising a byte-wise shift network 
(Ql), and a bit-wise shift network (Q2) and an output register (QO). 
Such processor array system is pref . connected to a host processor 
arranged to address the array as an extension of its own memory, via a 
scalar processor interface (MCU) for controlling the operation of the 
array. 

USE /ADVANTAGE - Parallel processing computer systems, scan 
array system with SIMD architecture; significant improvement in 
performance of system when handling matrices , and corner turning 
can be carried out in transit between off-chip memory and the 
processing element using an n bit shift register, and arranging the 
off-chip memory in horizontal mode with a word length equal to number 
of processing elements. (Dwg.No.1/5) 
Title Terms: PROCESSOR; ARRAY; SYSTEM; SIMD; ARCHITECTURE; SUB; ARRAY; 
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Graphics display system parametric curve evaluation method - stores 

NURBS data as sequence of records used to evaluate coordinates of 

determined parameter points along the curve 
Patent Assignee: IBM CORP {IBMC ); INT BUSINESS MACHINES CORP (IBMC ) 
Inventor: LUKEN W L 

Number of Countries: 005 Number of Patents: 003 
Patent Family: 

Patent No Kind Date Applicat No Kind Date 
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Abstract (Basic) : EP 425174 A 

The method of converting NURBS data representative of parametric 
curve into geometric coordinates of vertices of a polyline for 
subsequent display, the curve being composed of successive spans, 
involves organizing and locating the data in memory as a sequence of 
data records. A first subset of the sequence defines a first span of 
the curve with each successive record defining a corresponding span. 

The first set of data records are read and used to evaluate the 
coordinates of determined parameter points along the first span of the 
curve, with successive points evaluated form successive records. 
USE/ADVANTAGE - Evaluating and rendering curves for computer 
graphics display system offers high performance , good numerical 
stability, cost effectiveness, high speed and accuracy and has the 
advantages of NURBS. (26pp Dwg. No . 4/11F) 
Title Terms: GRAPHIC ; DISPLAY; SYSTEM; PARAMETER; CURVE; EVALUATE; METHOD 
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Multiprocessor system for graphic data processing - performs display 
processing in all processors using associated image memory regions 

Patent Assignee: FRAUNHOFER-GES FORD ANGE (FRAU ); FRAUNHOFER GES 
FOERDERUNG (FRAU ) 

Inventor: HAAKER T; JOSEPH H; SELZER H 
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Patent Details : 
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Abstract (Basic) : DE 3924759 A 

A multiprocessor system for graphical processing performs 
geometric processing of geometric objects in several processors and 
forms image point data values in an output region from transformed 
coordinate values. The data values are placed in an image memory. 
The processors perform the display processing whereby all 
processors simultaneously contain the results of the geometric 
object for image processing and the image point data associated 
with each processor are placed in an associated memory region. 

ADVANTAGE - The multiprocessor system eliminates data sensitivity 
and dynamically allocates processing power to ensure no processing 
capacity is wasted. (9pp Dwg. No. 5/5) 
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Single layer parallel distributed processing network - has number of 

nodes with inputs and outputs connected to head to tail forming weighted 
circuit defined by matrix 
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Abstract (Basic) : WO 9015390 A 

The network (10) consists of a matrix of N nodes in the form of 
amplifiers (12) which represent an N by N information storage matrix , 
A. The output port of each amplifier (20) is connected to a circuit 
which performs synaptic squashing (26) . The output of the squasher (26) 
is connected to either the inverting port (16) or noninverting port 
(18) of some or all of the amplifiers (12) in the network (10) 
according to the sign of the value of the corresp. element in the 
matrix , A. The absolute value of the element set by each connectivity 
resistor ( 34 ) . 

A similarity transformation matrix , T, is an N by N matrix 
whose columns are formed from a number of system output vectors and 
arbitrary vectors. The matrix product of A and T must equal the 
matrix product of T and a matrix of the eigenvalues of A. 

ADVANTAGE - Improves computation performance . (48pp Dwg.No.l/) 
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Embedding desired node interconnection in processor - using tree 
expansion scheme to upsize processing element count, and selecting parent 
and child units 
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Abstract (Basic) : GB 2231985 A 

The process for embedding a desired node interconnection topology 
in an assembly of processing elements fixedly interconnected through 
controllably enabled element ports, involves defining a desired 
operating processor element interconnection topology. A processor 
element port-to-port arrangement for the given assembly is determined 
which maximises processor element usage. The processor element node 
topology is embedded into the assembly by enabling selected element 
ports. A desired tree node topology is embedded in an assembly of 
processing elements by defining a desired tree node interconnection 
topology. 

A processor element port-to-port connection arrangement is 
determined for the given assembly which maximises use of processors 
known to be operable. The port-to-port connection arrangement is 
modified to minimise tree depth and the modified processor element 
connection arrangement is embedded into the assembly of elements by 
enabling selected processor element ports. The run performance of the 
operating processor element is monitored to detect elements which 
become inoperable. The port-to-port connection arrangements are 
modified to minimise tree depth and to make maximum use of remaining 
operable processor elements. 

ADVANTAGE - Improved fault tolerance. Reconf igurable . (4 5pp 
Dwg.No. 5/16 
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Parallel structure for modelling and training neutral networks - gives 
significantly effective performance in unsupervised learning 
environments 
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Abstract (Basic) : EP 388806 A 

A simultaneous computer structure for modelling and training 
artificial neuronal networks is linked to a host and formed from 
simple, identical processor elements as a two-dimensional matrix . 
These elements are supplied with a stream of commands from a sequencer 
according to the SIMD Principle. The elements set on the matrix 
diagonal are assigned to the neuronal network nodes and set apart for 
performing neurone functions. The non-diagonal processor elements take 
charge of the linking between the nodes and are set up for the 
function of the adjustable synaptic weightings. 

The matrix has a local adjacent network linked with the four 
immediately adjacent processors. Lines (38) lead separately from the 
neurone processors (28) in an x and y route destination. These select 
the non-diagonal synapse processors (30) simultaneously . In one 
destination these lines serve the accelerated distribution of 
computation results from the neurone processors to the synapse 
processors. In the other they serve the accelerated distribution of 
correction data during training. 

ADVANTAGE - Provides a simultaneous computer structure very well 
suited for installation, trials, testing and optimising free 
parameters, (llpp Dwg.No.4/6 
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WPI Acc No: 90-246633/199032 

XRAM Acc No: C90-106548 

XRPX Acc No: N90-191486 
Triode array for superconducting neural network - comprises network array 
of opto-electric current-carrying filaments and controlled light source 

Patent Assignee: US SEC OF NAVY (USNA ) 

Inventor: SZU H H 

Number of Countries: 001 Number of Patents: 001 
Patent Family: 

Patent No Kind Date Applicat No Kind Date Main IPC Week 

US 4943556 A 19900724 US 88252486 A 19880930 199032 B 

Priority Applications (No Type Date) : US 88252486 A 19880930 

Abstract (Basic) : US 4943556 A 

Parallel processing computer formed of an array of triodes 
comprises: 1-10000 parallel opto-electric current-carrying filaments 
and 1-10000 orthogonal similar filaments in physical but not electrical 
contact forming a triode array at the crossing nodes ; a controlled 
light supply to points of the first set of filaments just beyond the 
crossing nodes ; and means for receiving output signals from one set 
of filaments and supplying them to the light control to adjust the 
light pattern and provide iterative convergence towards a solution 
matrix based on the initialisation and the input. Pref. the filaments 
are made of superconducting material esp. YBa2Cu307 . 

USE /ADVANTAGE - As a neural network for superconductive computers 
operating at cryogenic temps, in e.g. space, etc., which eliminates the 
f N-squared bottleneck 1 of conventional technology. (9pp Dwg.No.3/5) 
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Abstract (Basic) : EP 358292 A 

In order to initialise a computer (10) which does not include a 
local boot device, a minimum boot program is loaded from a host 
computer (14) connected to the first computer by a communications 
system ( 12 ) . 

The mode being booted firstly broadcasts a boot request message 
over the communications system. A host computer determines that it is 
responsible for this function and down-loads a minimum boot control 
program. 

the network device of the slave node loads the minimum control 
program into the memory of the slave and activates the program. This 
control program can move itself to the high end of the memory and link 
itself into the start-up sequence. 

The normal self-test and boot system then continues but the boot 
control program intercepts accesses to disc and provides disc access 
from the host computer. 

ADVANTAGE - Allows booting with minimal knowledge of slave 
Title Terms: OPERATE; SYSTEM; DOWN; LOAD ; METHOD; COMPUTER; MINIMUM; BOOT 
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Single node imaging appts. for multi -processor network node - has at 
least one network access method for containing resources for directing 
data transport functions in and out of node 
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Patent 
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Abstract (Basic) : EP 314909 A 

An apparatus for coupling a computer complex having several coupled 
processors in a node , the node being coupled to a data 
communication network having several nodes and a group of 
communication lines linking the nodes . The communication lines are 
grouped into transmission groups, each of the transmission groups 
including at least one of said transmission lines, the computer complex 
appearing to the network as a single node . 

One of the processors is designated a control processor including 
resource managers for controlling functions within one of said nodes 
(PU) and the others of the other processors being designated 
non-control processors. The non-control processors including at least 
one network access method (NAM) containing resources for directing 
data transport functions in and out of the node . 
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Lighting model information processor for graphics work station - 
employs dynamic partitioning to balance computational workload among 
various parallel processors to avoid bottle-necks 
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EP 314341 Bl E 26 

Designated States (Regional) : DE FR GB IT 
DE 3853336 G Based on EP 314341 

Abstract (Basic) : EP 314341 A 

The system includes multiple floating point processing stages 
arranged and operated in pipeline. Each stage is constructed from one 
or more identical floating point processors. The system receives data 
representing coordinates in viewing space of vertices of a polygon 
and a normal at each of the vertices of the polygon. From that data 
coordinates of the vertices and screen colour intensity values 
associated with each vertex are calculated based upon a specified 
lighting model. 

Different processors may perform functions such as depth cueing, 
colour mapping and clipping from data representing ambient lighting and 
diffuse and specular reflection effects. 

USE /ADVANTAGE - Esp. in CAD/CAM High number of polygons processed 
per second means high image quality. 
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physical addresses is defined as a permutation function for generating 
optimised stride accesses in an interleaved multiple device system such 
as a large, parallel processing shared memory system where the 
function comprises a bit-matrix multiplication of a presented first 
(logical) address with a predetermined matrix to produce a second 
(physical) address. The permutation function maps the address from a 
first to a second address space for improved memory performance . The 
memory has n logical address bits and 2 to the power d separately 
accessible memory devices and a second address that utilises n - d bits 
of the first address as the offset within the referenced device node . 

A bit matrix multiplication is performed between successive rows 
of the matrix and bits of the first address to produce successive d 
bits of the second address. 

USE / ADVANTAGE - Highly parallel systems. Enhances power of two 
stride access. 
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WO 8901664 A E 151 

Designated States (National) : JP 

Designated States (Regional): AT BE CH DE FR GB IT LU NL SE 
EP 329771 A E 

Designated States (Regional) : DE FR GB IT NL 
US 5097411 A 45 
US 5155822 A 34 

US 5251322 A 46 Cont of US 8785081 

Cont of US 88184406 

Cont of US 5155822 

EP 329771 Bl E 89 Based on WO 8901664 

Designated States (Regional) : DE FR GB IT NL 

DE 3855234 G Based on EP 329771 

Based on WO 8901664 



Abstract (Basic) : WO 8901664 A 

A central processing unit having associated system virtual memory 
is operated in connection with at least one operating device by 
providing a bus which couples the first central processing unit to at 
least one operating device. A reserved I/O space having starting and 
ending addresses on the bus is provided, as is a system virtual address 
space for at least one operating device within the system virtual 
memory. Mapping registers are provided in the operating device. The 
first central processing unit is operated to transfer the starting and 
ending addresses of the reserved I/O space on the bus to the mapping 
registers in the operating device. 

The CPU is operated to map the address space of the operating 
device into the system virtual address space. The CPU is operated to 
unprotect the system virtual address space where the address space of 
at least one operating device is mapped. The CPU can directly access 
the operating device without the need for direct memory access 
hardware, operating system calls, and device drivers. 

ADVANTAGE - High performance , multi-user capability. 
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Patent Details: 

Patent Kind Lan Pg Filing Notes Application Patent 

SU 1405070 A 10 

Abstract (Basic) : SU 1405070 A 

The circuitry contg. an input vectors comparator, an output vectors 
comparator, a modulo-2 adder, a logical adder, a multiplexer, memory, a 
set of models and two clock inputs, has each clock input connected to a 
corresp. code converter (9) and (10) with outputs to a reverse counter. 

USE /ADVANTAGE - In computer engineering for solving problems on 
Petry graphs and enabling algorithms for modelling of parallel 
processes to be debugged, performance is improved by provision to 
change the number of vertices of transitions which can be made in 
parallel. The new parts enable the number of triggered vertex 
-transition models to be limited independently of the number of 
inquiries for modelling. Implementation of parallel algorithms on 
different computing appts. can be simulated. Bui . 23/23 . 6 . 88 . 
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Abstract (Basic) : SU 1388882 A 

The circuit contg. groups (1,2) of interfacing units, a matrix 
(3) of switching units (4) and the groups of data inputs (7-10), has 
the triangular matrices (5,6) of switching units (4) and the 
adjusting (15), initial setting (16) and sync. (17) inputs. 

In data-exchange between peripherals in a computing package, any 
pair of peripherals can be connected. Any two peripherals connected to 
the groups of interfacing units form an information communication 
channel by issuing the same codings. When the codings agree to a 
switching unit, switching of the channel takes place. The necessary 
channels between peripherals can be formed by adjustment when prepd. by 
a 1-level at the initial setting input. 

USE/ADVANTAGE - In computer engineering as an interface for 
data-exchange between peripherals. Performance is improved by any 
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pair of peripherals being connectible. Bui . 14/15 . 4 . 88 . 
1/2 

Title Terms: DATA; EXCHANGE; SYSTEM; INTERFACE; INITIAL; SET; ADJUST; INPUT 

; LOGIC; SWITCH; UNIT; TWO; TRIANGLE ; MATRIX 
Derwent Class: T01 

International Patent Class (Additional) : G06F-015/16 
File Segment: EPI 



14/5/53 (Item 53 from file: 351) 

DIALOG (R) File 351: DERWENT WPI 

(c) 2000 Derwent Info Ltd. All rts. reserv. 



007602591 **Image available** 

WPI Acc No: 88-236523/198834 

XRPX Acc No: N88-179730 
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Cited Patents: A3... 9116; EP 164880; No-SR.Pub; US 3906480; US 4580236; US 

4642625; WO 8500679 
Patent Details: 

Patent Kind Lan Pg Filing Notes Application Patent 

EP 279227 A E 24 

Designated States (Regional) : DE FR GB IT 
US 4816814 A 23 
EP 279227 Bl E 26 

Designated States (Regional) : DE FR GB IT 
DE 3889557 G Based on EP 279227 



Abstract (Basic) : EP 279227 A 

The adaptor has a digital signal processor (10) utilised to manage 
the overall adapter's resources . The instruction and data store (12) 
is an instruction RAM which can be loaded with additional micro code 
for the signal processor, and also acts as a data RAM and provides the 
primary interface between signal processor (10) and the system 
processor. The data store (12) also performs the function of being a 
main store for the signal processor (10) . 

A command FIFO register (14) serves as an input buffer for passing 
sequential commands to the digital signal processor (10) via an I/O bus 
(16) and, connects the video display adapter to the system or host 
processor. The pixel processor (18) contains logic that performs a 
number of display supporting functions such as line drawing 
manipulation which permits finite areas of the display screen to be 
manipulated. 

ADVANTAGE - Provides fast vector drawing independently of vector 
slope and position within screen area 
Title Terms: RASTER; DISPLAY; VECTOR; GENERATOR; TRIANGLE ; LOGIC; MATRIX 

; LINE; DRAW; UNIT; GENERATE; VECTOR; BIT; DIRECT; MASK; MONITOR; SCREEN 
Derwent Class: P85; T04 
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File Segment: EPI; EngPI 
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007402593 **Image available** 

WPI Acc No: 88-036528/198805 

XRPX Acc No: N88-027595 
Data flow processing elements in parallel computer architecture - has 
data flow elements interconnected by network which allows any processing 
element to send packets of information to any other element 

Patent Assignee: DATAFLOW COMPUTER CORP (DATA-N) ; DENNIS J B (DENN-I) 

Inventor: DENNIS J 

Number of Countries: 012 Number of Patents: 006 
Patent Family: 

Patent No Kind Date Applicat No Kind Date Main IPC Week 
WO 8800732 A 19880128 WO 87US1668 A 19870713 198805 B 

AU 8779120 A 19880210 198819 
US 4814978 A 19890321 US 86885836 A 19860715 198914 
EP 315647 A 19890517 EP 87905809 A 19870713 198920 
JP 2500393 W 19900208 JP 87505215 A 19870713 199012 
EP 315647 A4 19910131 EP 87905809 A 19870000 199515 

Priority Applications (No Type Date) : US 86885836 A 19860715 

Cited Patents: 3.Jnl.Ref; US 4153932; US 4197589; US 4413318; US 4591979; 

US 4644461 
Patent Details: 

Patent Kind Lan Pg Filing Notes Application Patent 

WO 8800732 A E 85 

Designated States (National) : AU JP 

Designated States (Regional) : BE CH DE FR GB IT NL SE 
US 4814978 A 36 
EP 315647 A E 

Designated States (Regional) : BE CH DE FR GB IT LI NL SE 

Abstract (Basic) : WO 8800732 A 

A static dataflow architecture uses many dataflow processing 
elements (110) to communicate by packets sent through a routing network 
(124) via paths (122). The routing instructions correspond to the modes 
of a directed graph in which any pair of nodes connected by an arc 
corresponds to a predecessor successor pair of instructions. 

Each predecessor instruction has one or more successor 
instructions, and each successor instruction has one or more 
predecessor instructions. The instructions include associations of 
execution components and enable components identified by instruction 
indices . 

ADVANTAGE - Uses VLSI chips to provide efficient high performance 
parallel computation. 
Title Terms: DATA; FLOW; PROCESS; ELEMENT; PARALLEL; COMPUTER; ARCHITECTURE 
; DATA; FLOW; ELEMENT; INTERCONNECT; NETWORK; ALLOW; PROCESS; ELEMENT; 
SEND; PACKET; INFORMATION; ELEMENT 
Derwent Class: T01 

International Patent Class (Additional): G06F-003/00; G06F-009/30; 

G06F-013/00; G06F-015/00 
File Segment: EPI 
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WPI Acc No: 88-030147/198805 

XRPX Acc No: N88-022567 
Multiple CPU program management method for networking - comparing remote 
computer request with program matrix and list of currently running 
programs and accessed data files to grant access 

Patent Assignee: INT BUSINESS MACHINES CORP (IBMC ); IBM CORP (IBMC ) 

Inventor: CROSSLEY J F 

Number of Countries: 006 Number of Patents: 005 
Patent Family: 
Patent No Kind Date 
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US 4780821 
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DE 3789175 



A 

A 

A 

Bl 

G 



Applicat No Kind Date 

19880203 EP 87108645 A 19870616 
19880315 
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19940302 EP 87108645 A 
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199415 
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Cited Patents: 2.Jnl.Ref; A3... 9019; EP 136666; GB 2062914; No-SR.Pub 

Patent Details : 

Patent Kind Lan Pg Filing Notes Application Patent 

EP 254854 A E 20 

Designated States (Regional) : DE FR GB IT 
US 4780821 A 18 
EP 254854 Bl E 20 

Designated States (Regional) : DE FR GB IT 
DE 3789175 G Based on EP 254854 



Abstract (Basic) : EP 254854 A 

The multi-program management method comprises the steps of 
converting a data management request originating at the server computer 
or one of the remote computers into a file shaving and record locking 
protocol request message. This message is then transmitted to the 
server computer which determines whether the request message is to be 
granted. A program matrix is established with entries indicating 
whether a program can be run while another program or group of programs 
are being run on the network. 

A list of programs is maintained which are currently being run on 
the network and data files currently being accessed or otherwise not 
available for access. The program matrix and list are then checked to 
see if the request message poses a conflict with a currently running 
program. 

ADVANTAGE - Allows program transfer without re-writing source code 
Title Terms: MULTIPLE; CPU; PROGRAM; MANAGEMENT; METHOD; COMPARE; REMOTE; 

COMPUTER; REQUEST; PROGRAM; MATRIX ; LIST; CURRENT; RUN; PROGRAM; ACCESS 

; DATA; FILE; ACCESS 
Derwent Class: T01 

International Patent Class (Main) : G06F-009/46 

International Patent Class (Additional): G06F-009/44; G06F-013/42; 

G06F-015/16 
File Segment: EPI 
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XRPX Acc No: N87-188098 
Computer system esp. for simulation of biological processes - has matrix 
of node processors interconnected via information and negator lines 

Patent Assignee: THOMAS G G (THOM-I) 
Inventor: MITTERAUER B 

Number of Countries: 008 Number of Patents: 004 
Patent Family: 

Patent No Kind Date Applicat No Kind Date Main IPC Week 
EP 235764 A 19870909 EP 87102829 A 19870227 198736 B 

DE 3607241 A 19870910 DE 3607241 A 19860305 198737 
US 4829451 A 19890509 US 8722256 A 19870305 198922 
DE 3607241 C 19920416 DE 3607241 A 19860305 199216 

Priority Applications (No Type Date) : DE 3607241 A 19860305 

Cited Patents: A3... 8836; DE 3429078; EP 132926; No-SR.Pub; US 3473160; US 

4518866 
Patent Details: 

Patent Kind Lan Pg Filing Notes Application Patent 

EP 235764 A G 33 

Designated States (Regional) : CH DE FR GB IT LI SE 
US 4829451 A 15 
DE 3607241 C 16 



Abstract (Basic) : EP 235764 A 

The main feature of the system is a central logic/processor unit 
(2) that is constructed as a matrix of node processors that are 
interconnected by information and rengator lines. Each mode processor 
has an input/processor control stage that is bus coupled to specific 
function processing units and a sub-node unit. 

Communication with the logic/processor unit is via an input module 
(1), peripherals (6) and a control unit (4). A bus (9,10) connects with 
an output module (3) with a coupled controller (5) . 

ADVANTAGE - Provides greater memory and processing capacity . 
Accurate simulation of neuronal computation systems. 
3/7 

Title Terms: COMPUTER; SYSTEM; SIMULATE; BIOLOGICAL; PROCESS; MATRIX ; 

NODE ; PROCESSOR; INTERCONNECT; INFORMATION; NEGATE; LINE 
Derwent Class: T01 

International Patent Class (Additional) : G06F-015/16 ; G06F-015/42 
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WPI Acc No: 86-253551/198639 

XRPX Acc No: N86-189615 
Data transmission switching system - has control establishing requested 
connection beginning at time based on prior established connections 

Patent Assignee: INT BUSINESS MACHINES CORP (I BMC ); IBM CORP (IBMC ) 
Inventor: FRANASZEK P A 

Number of Patents: 008 



007 



Number of Countries: 
Patent Family: 
Patent No Kind Date 
EP 195589 A 
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Priority Applications (No Type Date) 

19870512; US 90541574 A 19900621 
Cited Patents: 4.Jnl.Ref; A3... 8929; 
Patent Details: 
Patent Kind Lan Pg 
EP 195589 A E 22 

Designated States 
EP 195589 Bl E 28 

Designated States (Regional) : DE FR GB IT 
DE 3685599 G 
US 34528 E 22 



US 85713117 A 19850318; US 8748982 A 



No-SR.Pub 



Filing Notes 
(Regional) : DE FR GB IT 



Application Patent 



(Regional) : 
Based on 
Cont of 
Reissue of 



US 85713117 



EP 195589 



US 4752777 



Abstract (Basic): EP 195589 B 

The system includes a switching matrix (34) partitioned into 
selectable data transmission paths which provide connections between 
each of a no. of first parts of the matrix and selected ones of 
second ports of the matrix . First path controllers (30,40) control 
each data path for completing each selected connection. The system 
control (32,42) is responsive to a message requesting a connection 
between a first port and a selected second port to establish the 
requested connection. 

The system control establishes the requested connection- beginning 
at a determined time based on prior established connections to the 
selected second port. The path controllers establish the requested 
connection at the determined time. 

ADVANTAGE - High throughput control for wide band switching 
system. (22pp Dwg.No.3/19) 
Title Terms: DATA; TRANSMISSION; SWITCH; SYSTEM; CONTROL; ESTABLISH; 

REQUEST; CONNECT; BEGIN; TIME; BASED; PRIOR; ESTABLISH; CONNECT 
Derwent Class: T01; W01 

International Patent Class (Main) : G06F-015/16 ; H04Q-001/00 
International Patent Class (Additional): G06F-013/00; H04Q-003/68 
File Segment: EPI 
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WPI Acc No: 85-284527/198546 

XRPX Acc No: N85-212096 
Computer system for curve-solid classification and solid modelling - 
computes intersection of figures and solids represented as constructive 
solid geometry trees 

Patent Assignee: UNIV ROCHESTER (UYRP ) 

Inventor: ELLIS J L; KEDEM G 

Number of Countries: 012 Number of Patents: 004 
Patent Family: 
Patent No Kind Date 



Applicat No Kind Date 
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Priority Applications (No Type Date): US 84608295 A 19840508 
Cited Patents: 2.Jnl.Ref; A3... 8840; No-SR.Pub 
Patent Details: 

Patent Kind Lan Pg Filing Notes Application Patent 

EP 160848 A E 99 

Designated States (Regional): AT BE CH DE FR GB IT LI LU NL SE 
EP 160848 Bl E 67 

Designated States (Regional) : AT BE CH DE FR GB IT LI LU NL SE 
DE 3587668 G Based on EP 160848 



Abstract (Basic) : EP 160848 A 

A 1+log 2(N) by N grid or array (2) of processors is provided 
of which a bottom row is formed of N primitive classifiers (1) . Each 
classifier is connected to a combine processor of the log 2(N) by N 
array . A combine processor at the top left corner is always a root 
processor and is connected to a direct memory access unit (4) which 
passes the output of the curve-solid classification system to a host 
main memory ( 6) . 

An interface (3) allows the host computer (7) to read and write 
directly to the registers of the N primitive classifiers, to load the 
required recurrence coefficients into the registers. The functions 
computed by the primitive classifiers are: compute union; compute 
intersection; compute right input; compute bottom input; compute bottom 
input minus right input; pass information from the right; pass 
information from the bottom; and no-operation. 

USE - For image generation of solids on CRTs or hard copy 
printers. Used in computer-aided design and computer-assisted 
manufacturing applications, or for robot and machine tool simulation. 

6/29 

Title Terms: COMPUTER; SYSTEM; CURVE; SOLID; CLASSIFY; SOLID; MODEL; 

COMPUTATION; INTERSECT; FIGURE; SOLID; REPRESENT; CONSTRUCTION; SOLID; 
GEOMETRY; TREE 

Index Terms/Additional Words: ROBOT; TOOL; CAM; IMAGE ; CRT; PRINT 
Derwent Class: P85; T01; T06; X25 
International Patent Class (Main) : G06F-015/72 
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Data processing system for encoded control points - sends command to 
matrix maker card defining geometrical transformation to be performed on 

graphical illustration 
Patent Assignee: BOSCH R CORP (BOSC ) 
Inventor: ANDREWS D H; LUCHT P H; PUTNAM L K 
Number of Countries: 006 Number of Patents: 003 
Patent Family: 

Patent No Kind Date Applicat No Kind Date Main IPC Week 
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4208810 
Patent Details: 

Patent Kind Lan Pg Filing Notes Application Patent 

EP 146250 A E 169 

Designated States (Regional) : DE FR GB IT 

Abstract (Basic): EP 146250 A 

A number of separate micro-programmed circuit cards is used, each 
of which is programmed to perform a specific processing operation. A 
command is first sent to a matrix maker card (201) . This card, 
together with a matrix multiplier card (202), then calculates a 
transformation matrix representing the desired transformation. 

Electronic representations of control data points are then 
transmitted to the pipeline for processing and multipled by the 
transformation matrix , previously computed, in a vector multiplier 
circuit card (203) . Next, the control points are clipped to the planes 
of a viewing frustum by a number of clipper cards (205-209) . 

USE /ADVANTAGE - Reduces quantity of data needed to be stored to 
achieve real time animation. Increased processing speed. Does not need 
to convert curved portions into numerous line segments. 

2/9 

Title Terms: DATA; PROCESS; SYSTEM; ENCODE; CONTROL; POINT; SEND; COMMAND; 

MATRIX ; MAKER; CARD; DEFINE; GEOMETRY; TRANSFORM; PERFORMANCE ; 

GRAPHICAL ; ILLUSTRATE 
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PUB. NO. : 
PUBLISHED: 
INVENTOR (s) : 
APPLICANT (s) 

APPL. NO. : 
FILED: 
INTL CLASS: 
JAPIO CLASS: 



10-031615 [JP 10031615 A] 
February 03, 1998 (19980203) 
SHIMAMURA SAKAE 

NEC CORP [000423] (A Japanese Company or Corporation), 
(Japan) 

08-205208 [JP 96205208] 
July 16, 1996 (19960716) 

[6] G06F-012/00; G06F-013/00; G06F-013/00; G06F-015/16 
45.2 (INFORMATION PROCESSING — Memory Units); 45.4 
(INFORMATION PROCESSING — Computer Applications) 



JP 



ABSTRACT 

PROBLEM TO BE SOLVED: To suppress the load of directory server or network 
in a distributed hyper media system formed for possessing node link 
structure to be presented for supporting the navigation of user from the 
directory server. 

SOLUTION: A brousing device 40 is provided with a cache 18 for storing the 
node link structure acquired from a directory server 20. When the 
acquisition of contents of a certain node is requested from the user 
through an input part 11, a contents possessing part 13 acquired the 
contents of the relevant node from a distributed hyper media space 0 and 
a contents output part 12 outputs these contents to the user. At the same 
time, a communication part 15 for directory server first acquired only the 
node link structure within the range linked from this acquired node 



53 May 10, 2000 10:16 



Ginger Roberts - Search Report 



less than the prescribed number of link steps from the cache 18 but only 
when the node structure does not exist in the cache 18, it is possessed 
from the directory server 20 and a node link structure display part 14 
makes this node structure into graph and presents it for the user. 



14/5/61 (Item 2 from file: 347) 
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METHOD AND DEVICE FOR LAYING OUT DATA IN DISTRIBUTED STORAGE TYPE PARALLEL 
COMPUTERS 



PUB. NO.: 09-282290 [JP 9282290 A] 

PUBLISHED: October 31, 1997 (19971031) 
INVENTOR (s): SHINDO TATSUYA 

TAGUCHI KATSUHIKO 

APPLICANT(s) : FUJITSU LTD [000522] (A Japanese Company or Corporation), JP 
(Japan) 

APPL. NO.: 08-095749 [JP 9695749] 
FILED: April 17, 1996 (19960417) 

INTL CLASS: [6] G06F-015/16 ; G06F-009/45 

JAPIO CLASS: 45.4 (INFORMATION PROCESSING — Computer Applications); 45.1 
(INFORMATION PROCESSING -- Arithmetic Sequence Units) 

ABSTRACT 

PROBLEM TO BE SOLVED: To reduce the burdens on programming and to improve 
the processing performance of distributed storage type parallel computers 
by automatically deciding the arrangement of parallelizable parts included 
in sequential programs at the time of parallellizing the sequential 
programs . 

SOLUTION: A DBR graph for which DBRs (data layout basic areas) are made 
to nodes and the nodes are connected by directed branches in an 
execution order is generated (Al) . The candidates of data layouts for 
plural PEs are listed for the respective DBRs in the DBR graph (A2) and a 
global alignment graph for which the respective candidates are the nodes 

and the nodes are connected by the directed branches by all combinations 
corresponding to the directed branches of the DBR graph is generated 
(A3) . A shortest time route from a start point to an end point along the 

nodes and the directed branches in the global alignment graph is 
extracted (A4) and the data layout to the 1 plural PEs is performed 
corresponding to the candidates on the shortest time route (A5) . 
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( INFORMATION PROCESSING — Memory Units); 45.4 (INFORMATION 
PROCESSING — Computer Applications) 

ABSTRACT 

PROBLEM TO BE SOLVED: To provide a decentralized processing system which 
eliminates a conflict of acquisition of a sequentially used common 
resource when plural processors which perform decentralized processes 
share the sequentially used common resource and allows the respective 
processors to use the sequentially used common resource in the order of 
the completion of the job processes. 

SOLUTION: Respective S-FEP (slave Front End Processor) 15-17 before sending 
image data to a printer 18 send request-to-send messages to a scheduler 
14, and send image data to the printer 18 after receiving OK-to-send 
messages from the scheduler 14. Here, the scheduler 14 manages the 
operation state and transmission state of the printer 18 by a printer state 
management part and sequentially processes job nodes that a transfer job 
management part manages according to the state of the printer 18 . 
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SEMICONDUCTOR DEVICE AND ARITHMETIC OPERATION SYSTEM USING THE SAME, IMAGE 

PROCESSING SYSTEM, SOUND SIGNAL PROCESSING SYSTEM, PATTERN RECOGNITION 
SYSTEM, SIGNAL PROCESSING SYSTEM, PARALLEL DATA PROCESSING SYSTEM, 
AND VIDEO SIGNAL PROCESSING SYSTEM 



PUB. NO.: 09-212339 [JP 9212339 A] 

PUBLISHED: August 15, 1997 (19970815) 
INVENTOR (s): OMI TADAHIRO 
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APPLICANT(s) : OMI TADAHIRO [000000] (An Individual), JP (Japan) 
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42.4 (ELECTRONICS — Basic Circuits); 44.1 (COMMUNICATION — 

Transmission Circuits & Antennae) 
JAPIO KEYWORD: R097 (ELECTRONIC MATERIALS — Metal Oxide Semiconductors, 

MOS) 

ABSTRACT 

PROBLEM TO BE SOLVED: To constitute a large-scale highly parallel system, 
having operation elements coupled closely with one another, in one chip by 
using a device which has a floating node . 

SOLUTION: Capacity means 8-13 are connected to input terminals 16-21 
respectively. The common connection part between the uninverted input 
terminal 6 of a high-input-impedance operational amplifier 1 and capacity 
means 8-11, and the contact of the common connection part between the 
inverted input terminal 7 and capacity means 12-15 are floating nodes 
respectively. This arithmetic circuit comes into a signal operation mode 
wherein the signal from a precedent-stage operational amplifier 23 is 
received and processed when a reset signal 5 is negative. The gain setting 
of the operational amplifier 1 is determined by the capacity ratio of a 
capacity means 15 for negative feedback, and a grounded capacity means 
14 and capacity means 8-13. By changing the capacity ratio, 
multi-valued linear operation is made possible. 
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45.4 (INFORMATION PROCESSING — Computer Applications); 45.3 
(INFORMATION PROCESSING — Input Output Units); 45.9 
(INFORMATION PROCESSING — Other) 

ABSTRACT 

PROBLEM TO BE SOLVED: To recognize distribution of accurate performance 
data intuitionally by plotting performance data in the direction of 
height of an axis orthogonal to a picture simulating each node of 
parallel operation computers on a plane coordinate. 

SOLUTION: A picture simulating computers on a plane coordinate is plotted 
on a display screen 1 and performance data of collected parallel computer 
system are converted and displayed in the direction of height of an axis, 
orthogonal to the picture . In this case, a minimum unit of display is a 
node and a network, and a node part is made up of, e.g. a frame 101, a 
CPU 102, a memory 103, a reception amount 104 and a transmission amount 
105, and the network is made up of a 1st communication channel 98, a 2nd 
communication channel 99 and a router 100 and they are plotted. A thickness 
in the direction of height of a displayed image in response to the 
received performance data is changed for each part except the frame 101 
to represent the performance data at that time. While expressing a 
difference from each node of the parallel computer system, a cross 
reference with an actual indication is easily taken through the display 
method as above. 
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ABSTRACT 

PURPOSE: To realize the quick reaction of VP capacity changing control. 

CONSTITUTION: A repeating node 2 repeating plural VPs is provided with a 
connecting function part 30, an OAM cell copying part 41, the possibility 
of VP capacity change judging part 42 and a returning part 43. The 
connection function part 30 transfer-processes an OAM cell received from an 
upstream node to a down stream node . The OAM cell copying part 41 
copies the OAM cell in parallel with the processing of the connecting 
function part 30. The possibility of VP capacity change judging part 42 
extracts a control message from the OAM cell copied by the OAM cell copying 
part 41 to judge the possibility of changing the capacity . A returning 
part 43 transmits the juging result of the possibility of VP capacitor 
change judging part 42 to a transmission terminal node 1. A capacity 
change control part 11 in the transmission terminal node 1 judges the 
capacity change of VP from the transmission terminal node 1 to a 
terminating node 3 to be possible when the judging result from the 
repeating node 2 may image . 
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ABSTRACT 

PURPOSE: To perform processing at a high speed by eliminating the need for 
reallocating a processing program to a processor and rearrange data at the 
time of execution even when network constitution is altered. 



CONSTITUTION: Respecitve nodes l-(n) are each provided with >=2 
processors PE1 and PE2 . The processing program is allocated to the 
processors PE1 and PE2 in the order of the nodes l-(n) without being made 
to correspond to the transfer direction of the data, which are arranged 
while made to correspond to the processors PE1 and PE2 . When the data are 
transferred to the processors PE1 and PE2 , the processors performs data 
arranged on the respective processors and the transferred data to perform 
processing for finding the product of a matrix , etc. The processors are 
allocated in the order of the nodes , so the network constitution can 
easily be altered. Even when the network constitution is altered, the need 
to rearrange the data at the time of the execution is eliminated and 
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the processing is performed at a high speed. 
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ABSTRACT 

PURPOSE: To attain the direct use of accumulated software resources by 
converting each processing content to the processing code of machine 
language level by a conversion means, and automatically generating the 
execution codes of parallel computers by an execution code generating 
means . 



CONSTITUTION: A flow chart analysis part 12 identifies dependence for each 

node and between the nodes and another information from a graphic , a 
line, and a character described on an inputted flow. chart. Either of 
conversion parts 16a-16e in accordance with the respective processing 
content of the node and another information identified by the flow 
analysis part 12 is selected by a conversion decision part 14, then, 
conversion is performed. One execution code can be generated from plural 
conversion results obtained by the conversion parts 16a-16e which convert 
the processing content of the node to the processing code of machine 
language level corresponding to a decision result by an execution code 
generating part 30. 
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JOURNAL: Section: E, Section No. 1586, Vol. 18, No. 407, Pg. 130, July 

29, 1994 (19940729) 

ABSTRACT 

PURPOSE: To reduce the entire contention arbitration time and a transfer 
path for various data by forming a large scale matrix switch with sets of 
lots of small scale matrix switches and processing contention arbitration 
in parallel through divided processing . 

CONSTITUTION: Groups of divided request generating sources R1-R8 and 

resources S1-S8 are mutually connected by each of small scale matrix 
switches G{1, 1)-G(4, 4) respectively. Through the constitution above, 
access requests and/or data from the request generating sources R1-R8 are 
directly transferred to the matrix switches G(l, 1)-G(4, 4) 
interconnected by input side connection lines L1-L8, and the access 
requests and/or data from the matrix switches G(l, 1)-G(4, 4) to the 
resources S1-S8 are directly transferred to the resources S1-S8 
interconnected by output side connection lines L01-L08. Thus, number of 
passing cross points is decreased and the contention arbitration is 

implemented in parallel by using each small scale matrix switch, then the 
entire processing speed is quickened. 
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PURPOSE: To 
computer . 



ABSTRACT 

efficiently process a finite element method in a parallel 



CONSTITUTION: A region of analysis is divided into finite elements in a 
procedure 1. Respective element matrixes corresponding to respective finite 
elements are prepared in parallel by respective processor elements in a 
state where the respective finite elements are allocated to the respective 
processor elements in the parallel computer in a procedure 2. Respective 
pieces of data in the respective element matrixes and the members of nodes 

generating the respective elements are transferred to the respective 
processor elements taking charge of the node numbers and data on the 
element matrixes are added in the processor element in charge of the 
respective nodes in parallel so as to efficiently generate a global 

matrix at high speed in a state where the respective nodes are 
allocated to the respective processor elements in a procedure 3. Wasteful 
data transfer at the time of solving an equation expressed by the matrix 

is eliminated by using the number of the node making the element 
transferred at the time of preparing the global matrix for data transfer 
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in a procedure 4 . 
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ABSTRACT 

PURPOSE: To reduce influence upon processing execution time due to 
communication between processors by allocating each node of a data flow 

graph so that the number of pockets to flow between the processors 
becomes small. 



CONSTITUTION: In respect of an objective node to be allocated selected by 
a next allocated node selecting means 14, a preceding node to output an 
arc to be inputted to the objective node to be allocated is searched by a 
preceding node searching means 15. Next, the processor allocated to the 
preceding node is searched by a searching means 16 for the processor 
allocated to the preceding node . The processor is allocated to the 
objective node to be allocated by an allocated processor determining 
means 4 according to the allocating state of the processor so that the 
number of the packets to flow between the processors becomes small. Then, 
the data flow graph is divided. 
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Circuits, LSI & GS 
JOURNAL: Section: P, Section No. 1397, Vol. 16, No. 365, Pg. 151, 

August 06, 1992 (19920806) 

ABSTRACT 

PURPOSE: To execute large scale wiring processing with memory small in 
capac ity at high speed by executing route decision at every grating unit 
in each small area of rough wiring route in each small area in parallel. 

CONSTITUTION: A CPU21 generalizes the whole operation, and processors ( PE ) 
22a-22d, 23a-23h perform parallel processing at each processing stage, 
and they are connected in hierarchical fashion, and are connected to shared 
memory 20, 24a-24d, and 25. In such a case, a division area in which a 
wiring area is divided is divided into the small areas, hence, each of the 
small areas is set as a node , and a graph provided with a branch 
between points in accordance with the small area is generated when 
neighboring relation exists between the small areas. The route between the 
small areas is retrieved from the connection relation of the graph , and 
it is set as the rough wiring route, and finally, detail wiring is 
performed in parallel at every small unit on the rough wiring route. 
Thereby, it is possible to execute retrieval between elements in a large 
scale wiring area at high speed, and to reduce required memory capacity . 
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ABSTRACT 

PURPOSE: To guarantee the uniformity of data discriminators by checking the 
flow of processing such as the loop structure of respective processors at 
the time of compiling, executing SEND processing and RECEIVE processing 
and developing the loop structure until the uniformity is held. 

CONSTITUTION: Different data discriminators are assigned 2 to respective 
combinations of all SEND and RECEIVE processing expressed on a program. An 
S pole for executing the SEND processing, an R node for executing the 
RECEIVE processing, a converging point node for expressing loop 
structure, and a branch processing node are extracted to form a control 
flow graph 6. The control flow graph 6 expresses the order relation of 
program execution processing. All corresponding combinations of S and R 
nodes included in the loop structure are reversely searched and whether 
the execution of an R node corresponding to the i-th loop is ended or not 
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prior to the S node in the i-th loop can be executed. When said R node 

is not executed, the loop is developed twice. Consequently, uniformity of 

data discriminators indicating the correspondence between the SEND 

processing and RECEIVE processing can be guaranteed. 
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ABSTRACT 

PURPOSE: To shorten a load /storing time and to attain high speed 
calculation by processing with a long vector length consisting of a 
connecting register in the data processing of a column direction and 
processing with a length corresponding to a row consisting of a division 
register in the operation of the row and the row. 

CONSTITUTION: In the processing PI, the vector registers are connected 
a coefficient matrix is loaded with the long vector length, namely, a (sub 
11) -a (sub nn) are integrally loaded. Then, in the processing P2, the 
elements of the coefficients disposed at an equal interval on the 
coefficient matrix are collected to a series of registers by the use of 
the mask of the pattern of the equal interval. In a P3, the constitution of 
the register is changed to the data of one row match the data of one row to 
one vector register. Then, in a P4, a discharge calculation for bringing 
the element of a partial coefficient to 1 0 1 on the reconstituted register 
is executed. These processings P1-P4 are repeated on all the rows. 
According to the processings, the coefficient matrix goes to a triangle 

matrix as a discharge calculation result R. In such a way, data is 
stored in the register as much as possible to reduce the number of memory 
accesses and shorten the load / storing time. 
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ABSTRACT 



PURPOSE: To equalize a load of a processor by successively forward and 
backward substituting an LU decomposed matrix of respective factor 

matrices to vector data inputted from a repetition calculation circuit by 
a forward and backward substitution circuit and performing a reverse 

matrix operation of an approximate matrix . 

CONSTITUTION: In an asymmetrical linear equation obtained by approximating 
five point differences of a two-dimensional advection diffusion equation 
applied on a rectangular area, the equation to be solved is represented by 
an equation 1. When an integral vector is considered to be U and a 
difference equation is represented as A.u=f, A has the structure as shown 
in the figure, D corresponds to D of the equation 1, Ax to Axl, and Ay to 
Ayl, Ayu, respectively. The matrix A is inputted by a signal line 11 and 
the vector (f) is inputted and in a factor matrix decomposing circuit 1, 
lower triangle matrices Lx, L, y and upper triangle matrices Ux, Uy 
are calculated. Thereby, when a pipe line system is used in the processor, 
it is not required to use a list vector system and when a parallel 

processor system is used, the number of the operations capable being 
executed in parallel is made constant and the equality in the load of the 
operation processor can be maintained. 
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The current compute environment that most researchers are using for the 
calculation of 3D unsteady Computational Fluid Dynamic (CFD) results is a 
super-computer class machine. The Massively Parallel Processors (MPP 
■ s) such as the 160 node IBM SP2 at NAS and clusters of workstations 
acting as a single MPP (like NAS 1 s SGI Power-Challenge array and the J90 
cluster) provide the required computation bandwidth for CFD calculations of 
transient problems. If we follow the traditional computational analysis 
steps for CFD (and we wish to construct an interactive visualizer) we need 
to be aware of the following: (1) Disk space requirements. A single 
snap-shot must contain at least the values (primitive variables) stored at 
the appropriate locations within the mesh. For most simple 3D Euler solvers 
that means 5 floating point words. Navier-Stokes solutions with turbulence 
models may contain 7 state-variables. (2) Disk speed vs. Computational 
speeds. The time required to read the complete solution of a saved time 
frame from disk is now longer than the compute time for a set number of 
iterations from an explicit solver. Depending, on the hardware and solver 
an iteration of an implicit code may also take less time than reading the 
solution from disk. If one examines the performance improvements in the 
last decade or two, it is easy to see that depending on disk performance 

(vs. CPU improvement) may not be the best method for enhancing 
interactivity. (3) Cluster and Parallel Machine I/O problems. Disk access 
time is much worse within current parallel machines and cluster of 
workstations that are acting in concert to solve a single problem. In this 
case we are not trying to read the volume of data, but are running the 
solver and the solver outputs the solution. These traditional network 
interfaces must be used for the file system. (4) Numerics of particle 
traces. Most visualization tools can work upon a single snap shot of the 
data but some visualization tools for transient problems require dealing 
with time. (Derived from text) 

SOURCE OF ABSTRACT/SUBFILE: NASA CASI 
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As we gain experience with parallel file systems, it becomes increasingly 
clear that a single solution does not suit all applications. For example, 
it appears to be impossible to find a single appropriate interface, caching 
policy, file structure, or disk-management strategy. Furthermore, the 
proliferation of file-system interfaces and abstractions make applications 
difficult to port. We propose that the traditional functionality of 
parallel file systems be separated into two components: a fixed core that 
is standard on all platforms, encapsulating only primitive abstractions and 
interfaces, and a set of high-level libraries to provide a variety of 
abstractions and application-programmer interfaces (API's). We present our 
current and next-generation file systems as examples of this structure. 
Their features, such as a three-dimensional file structure, strided read 
and write interfaces, and I/O-node programs, are specifically designed 
with the flexibility and performance necessary to support a wide range of 
applications. (Author) 
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This paper discusses some aspects of design of a data distributed, 
massively parallel volume rendering library for runtime visualization of 
parallel computational fluid dynamics simulations in a message-passing 
environment. Unlike the traditional scheme in which visualization is a 
postprocessing step, the rendering is done in place on each node 

processor. Computational scientists who run large-scale simulations on a 
massively parallel computer can thus perform interactive monitoring of 
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their simulations. The current library provides an interface to handle 
volume data on rectilinear grids. The same design principles can be 
generalized to handle other types of grids. For demonstration, we run a 
parallel Navier-Stokes solver making use of this rendering library on the 
Intel Paragon XP/S. The interactive visual response achieved is found to be 
very useful. Performance studies show that the parallel rendering process 
is scalable with the size of -the simulation as well as with the parallel 
computer. (Author) 
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This paper presents a divide-and-conquer ray-traced volume rendering 
algorithm and its implementation on networked workstations and a massively 
parallel computer, the Connection Machine CM- 5. This algorithm distributes 
the data and the computational load to individual processing units to 
achieve fast, high-quality rendering of high-resolution data, even when 
only a modest amount of memory is available on each machine. The volume 
data, once distributed, is left intact. The processing nodes perform 
local ray-tracing of their sub-volume concurrently. No communication 
between processing units is needed during this locally ray-tracing process. 

A sub-image is generated by each processing unit and the final image is 
obtained by compositing sub-images in the proper order, which can be 
determined a priori. Implementations and tests on a group of networked 
workstations and on the Thinking Machines CM-5 demonstrate the practicality 
of our algorithm and expose different performance tuning issues for each 
platform. We use data sets from medical imaging and computational fluid 
dynamics simulations in the study of this algorithm (DOE) 
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One of the major achievements in engineering science has been the 
development of computer algorithms for solving nonlinear differential 
equations such as the Navier-Stokes equations. In the past, limited 
computer resources have motivated the development of efficient numerical 
schemes in computational fluid dynamics (CFD) utilizing structured meshes. 
The use of structured meshes greatly simplifies the implementation of CFD 
algorithms on conventional computers. Unstructured grids on the other hand 
offer an alternative to modeling complex geometries. Unstructured meshes 
have irregular connectivity and usually contain combinations of triangles 
, quadrilaterals, tetrahedra, and hexahedra . The generation and use of 
unstructured grids poses new challenges in CFD. The purpose of this note is 
to present recent developments in the unstructured grid generation and flow 
solution technology (H.A.) 

SOURCE OF ABSTRACT/SUBFILE: NASA CASI 
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Advanced algorithms for two and three dimensional modeling of 
semiconductor devices have been developed, implemented on parallel 
computers and tested using several high performance technologies. 
Computational limitations for semiconductor device analysis have been 
extended to greater than 100000 nodes and speedup factors greater than 
10-fold have been realized using distributed memory (MIMD) architectures. 
Two classes of algorithms have been explored using parallel processing 
-distributed multifrontal (DMF) and Monte Carlo (MC) . The DMF algorithm has 
been implemented and tested for 3D device analysis of MOS, bipolar and 
latchup examples using iterative methods for single- and two-carrier 
transport. A windowed MC analysis of 2D hot carrier effects in Si MOS and 
GaAs MESFET devices has been achieved on .several parallel architectures 
with near ideal speedup factors up to 20 processors. Useability of device 
simulation has been enhanced and demonstrated through applications. The 
range of technologies that can be modeled with the 2D PISCES program now 
includes: GaAs, GeSi heteroj unctions and photo- and ■ other 
carrier-generation process. Moreover, layout-driven input 2D/3D output 

visualization capabilities increase user efficiency. Device and 
technology scaling applications have been used to evaluate both 2D and 3D 
device capabilities. BiCMOS scaling issues and new structures have been 
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evaluated using PISCES and mixed-mode (device circuit) capabilities (DTIC) 
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Systolic Kalman filter (SKF) designs based on a triangular array 
(triarray) configuration are presented. A least squares formulation, which 
is an expanded matrix representation of the state space iteration, is 
adopted to develop an efficient iterative QR triangularization and 
consecutive data prewhitening formulations. This formulation has advantages 
in both numerical accuracy and processor utilization efficiency. Moreover, 
it leads naturally to pipelined architectures such as systolic or wavefront 
arrays. For an n state and m measurement dynamic system, the SKF triarray 
design uses n(n + 3)/2 processors and requires only 4n + m timesteps to 
complete one iteration of prewhitened Kalman filtering system. This means a 
speedup factor of approximately n-squared/4 when compared with a sequential 
processor. Also proposed for the colored noise case are data prewhitening 
triarrays which offer compatible speedup performance for the 
preprocessing stage. Based on a comparison of several competing 
alternatives, the proposed array processor may be considered a most 
efficient systolic or wavefront design for Kalman filtering. (I.E.) 

SOURCE OF ABSTRACT/SUBFILE: AIAA 
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Abstract: A parallel unstructured finite element (FE) reacting flow 
solver designed for message passing MIMD computers is described. This 
implementation employs automated partitioning algorithms for load 
balancing unstructured grids, a distributed sparse matrix representation 
of the global FE equations, and parallel Krylov subspace iterative 
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solvers. In this paper, a number of issues related to the efficient 
implementation of parallel unstructured mesh applications are presented. 
These issues include the differences between structured and unstructured 
mesh parallel applications, major communication kernels for unstructured 
Krylov iterative solvers, automatic mesh partitioning algorithms, and the 
influence of mesh partitioning metrics and single-node CPU performance 
on parallel performance . Results are presented for example FE heat 
transfer, fluid flow and full reacting flow applications on a 1024 
processor nCUBE 2 hypercube and a 1904 processor Intel Paragon. Results 
indicate that very high computational rates and high scaled efficiencies 
can be achieved for large problems despite the use of sparse matrix data 
structures and the required unstructured data communication. (Author 
abstract) 25 Refs. 
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Abstract: Most of existing visualization applications use 3D geometry 
as their basic rendering primitive. As users demand more complex datasets, 
the memory requirements for retrieving and storing large 3D models are 
becoming excessive. In addition, the current 3D rendering hardware is 
facing a large memory bus bandwidth bottleneck at the processor to 
graphics pipeline interface. Rendering 1 million triangles with 24 
bytes per triangle at 30Hz requires as much as 720 MB/sec memory bus 
bandwidth. This transfer rate is well beyond the current low-cost graphics 
systems. A solution is to compress the static 3D geometry as an off-line 
pre-process. Then, only the compressed geometry needs to be stored in main 
memory and sent down to the graphics pipeline for real-time decompression 
and rendering. We present several new techniques for compression of 3D 
geometry that produce 2 to 3 times better compression ratios than existing 
methods. We first introduce several algorithms for the efficient encoding 
of the original geometry as generalized triangle meshes. This encoding 
allows most of the mesh vertices to be reused when forming new triangles 
. Our second contribution allows various parts of a geometric model to be 
compressed with different precision depending on the level of details 
present. Together, our meshifying algorithms and the variable compression 
method achieve compression ratios of 30 and 37 to one over ASCII encoded 
formats and 10 and 15 to one over binary encoded triangle strips. Our 
experimental results show a dramatically lowered memory bandwidth required 
for real-time visualization of complex datasets. (Author abstract) 11 
Refs . 
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Abstract: Volume rendering is a class of algorithms for creating images 
from volume sampled data sets without computing intermediate surface 
representations. Because of the inherent 0(N**3) run time, numerous 
approximations are used to provide interactivity. One approach for high 
performance is parallelization on general purpose computers. We have 
developed a highly efficient, high fidelity approach that is called 
permutation warping. Permutation warping may use any one pass filter 
kernel, an example of which is trilinear reconstruction. Lacroute et al . ' s 
shear warp uses a bilinear multipass filter, for fewer operations, but an 
inferior transfer function. This paper discusses experiments in improving 
permutation warping using data dependent optimizations. We use a linear 
octree on each processor to encode coherent and empty regions efficiently, 
and to provide a means for adaptive resampling. Static load balancing is 
also used to redistribute nodes from processor's octtree to achieve 
higher efficiencies. Performance timings from a 4096 processor MasPar 
MP-2 implementation show a 3 to 5 times speedup over brute force 
permutation warping, depending upon the dataset. Actual performance is 3 
to 4 frames/second on 128 multiplied by 128 multiplied by 128 volumes. 
Because of the scalability of permutation warping, performance of 12-16 
frames/second is expected on a 16,384 processor machine. It is expected 
that implementation on more current SIMD or MIMD architectures would 
provide 30-60 frames/second on larger volumes. (Author abstract) 31 Refs. 
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Abstract: The paper describes a parallel implementation of a grand 
challenge problem: global atmospheric modeling. The novel contributions of 
our work include (1) a detailed investigation of opportunities for 
parallelism in atmospheric global modeling based on spectral solution 
methods, (2) the experimental evaluation of overheads arising from load 
imbalances and data movement for alternative parallelization methods, and 
(3) the development of a parallel code that can be monitored and steered 
interactively based on output data visualizations and animations of program 
functionality or performance . Code parallelization takes advantage of the 
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relative independence of computations at different levels in the earth's 
atmosphere, resulting in parallelism of up to 40 processors, each 
independently performing computations for different atmospheric levels and 
requiring few communications between different levels across model time 
steps. Next, additional parallelism is attained within each level by taking 
advantage of the natural parallelism offered by the spectral computations 
being performed (e.g. taking advantage of independently computable terms in 
equations) . Performance measurements are performed on a 64 -node KSR2 
supercomputer. However, the parallel code has been ported to several shared 
memory parallel machines, including SGI multiprocessors, and has also been 
ported to distributed memory platforms like the IBM SP-2 . (Author abstract) 
38 Refs. 
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Abstract: JPL ' s Remote Interactive Visualization and Analysis System 
(RIVA) is described in detail. The RIVA system integrates workstation 
graphics, massively parallel computing technology, and gigabit 
communication networks to provide a flexible interactive environment for 
scientific data perusal, analysis, and visualization . RIVA's kernel is a 
highly scalable parallel perspective renderer tailored especially for the 
demands of large datasets beyond the sensible reach of workstations. Early 
experience with using RIVA to interactively explore and process 
multivariate, multiresolution datasets is reported; several examples using 
data from a variety of remote sensing instruments are discussed in detail 
and the results shown. Particular attention is placed on describing the 
algorithmic details of RIVA's parallel renderer kernel, with emphasis on 
the key aspects of achieving the algorithm's overall scalability. The paper 
summarizes the performance achieved for machine sizes up to more than 500 
nodes and for initial input image/terrain bases in the 2 Gbyte range. 
(Author abstract) 14 Refs. 
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Abstract: Scientific visualization and virtual reality have pushed 
three -dimensional graphics engines to their limits for updating scenes 
in real-time. One bottleneck of graphic systems is the transformation of 
an object's vertices into normalized space based on an evaluated 
transformation stack. This operation is often done in floating point, 
requiring a fast floating point multiply-accumulate unit. This paper 
presents architectural optimizations to a graphics pipeline floating point 
multiply-accumulate unit by using block floating point and parallelism to 
bypass or merge trivial operations in the matrix multiplications. (Author 
abstract) 9 Refs. 
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Abstract: JPL 1 s Remote Interactive Visualization and Analysis System 
(RIVA) is described in detail. RIVA's kernel is a highly scalable 
perspective renderer tailored especially for the demands of large datasets 
beyond the sensible reach of workstations. The algorithmic details of this 
renderer are described, particularly the aspects key to achieving the 
algorithm's overall scalability. The paper summarizes the performance 
achieved for machine sizes up to more than 500 nodes and for initial 
input image/terrain bases of up to a gigabyte. The RIVA system integrates 
workstation graphics, massively parallel computing technology, and gigabit 
communication networks to provide a flexible interactive environment for 
scientific data perusal, analysis and visualization . Early experience 
with using RIVA to interactively explore multivariate datasets is reported 
and some example results given. (Author abstract) 12 Refs. 
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Abstract: In this paper, a new sort-last parallel polygon rendering 
implementation is given for 2-D mesh message-passing architectures such as 
the Intel Delta and Paragon. Our implementation provides a very fast 
rendering rate for extremely large sets of polygons, a requirement of 
scientific visualization , CAD/CAM, and many other applications. We 
implement and evaluate our scheme on the Intel Delta parallel computer at 
Caltech. Using 512 processors to render Eric Haines's SPD standard scenes, 
our scheme achieves a rendering rate of 2.8 - 4.0 million triangles 
/second. (Author abstract) 21 Refs. 
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Abstract: A large number of data-parallel applications can be represented 
as computational graphs from the perspective of parallel computing. The 
nodes of these graphs represent tasks that can be executed concurrently, 
while the edges represent the interactions between them. Further, the 
computational graphs derived from many applications are such that the 
vertices correspond to multi-dimensional coordinates, and the interaction 
between computations is limited to vertices that are physically 
proximate. In this paper we show that graphs with these properties can be 
transformed into simple architecture-independent representations that 
encapsulate the locality in these graphs . This representation allows a 
fast mapping of the computational graph onto the underlying architecture at 
the time of execution . This is necessary for environments where 
available computational resources can be determined only at the time of 
execution or that change during execution. (Author abstract) 32 Refs. 
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Title: Recursive spectral algorithms for automatic domain partitioning in 
parallel finite element analysis 

Author: Hsieh, Shang-Hsien; Paulino, Glaucio H. ; Abel, John F. 
Corporate Source: Purdue Univ, West Lafayette, IN, USA 

Source: Computer Methods in Applied Mechanics and Engineering v 121 n 1-4 
Mar 1995. p 137-162 

Publication Year: 1995 

CODEN: CMMECC ISSN: 0045-7825 

Language: English 

Document Type: JA; (Journal Article) Treatment: T; (Theoretical) 
Journal Announcement: 950 6W3 

Abstract: Recently, several domain partitioning algorithms have been 
proposed to effect load -balancing among processors in parallel finite 
element analysis. The recursive spectral bisection (RSB) algorithm left 
bracket 1 right bracket has been shown to be effective. However, the 
bisection nature of the RSB results in partitions of an integer power of 
two, which is too restrictive for computing environments consisting of an 
arbitrary number of processors. This paper presents two recursive spectral 
partitioning algorithms, both of which generalize the RSB algorithm for an 
arbitrary number of partitions. These algorithms are based on a graph 
partitioning approach which includes spectral techniques and graph 
representation of finite element meshes. The 'algebraic connectivity 
vector 1 is introduced as a parameter to assess the quality of the 
partitioning results. Both node -based and element-based partitioning 
strategies are discussed. The spectral algorithms are also evaluated and 
compared for coarse-grained partitioning using different types of 
structures modelled by 1-D, 2-D and 3-D finite elements. (Author abstract) 
28 Refs. 
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Source: Computer Methods in Applied Mechanics and Engineering v 119 n 3-4 
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Publication Year: 1994 

CODEN: CMMECC ISSN: 0045-7825 

Language: English 

Document Type: JA; (Journal Article) Treatment: A; (Applications); T; 
(Theoretical) 

Journal Announcement: 9505W1 

Abstract: Full three-dimensional analyses of ductile failure are carried 
out for tensile test specimens under dynamic loading, using a data parallel 
implementation of a ductile porous material model in a transient 3D finite 
element program. The elastic-viscoplastic material model accounts for 
ductile failure by the nucleation, growth and coalescence of micro-voids. 
Most of the results are obtained using 20 node isoparametric brick 
elements and reduced (2 multiplied by 2 multiplied by 2) quadrature. The 
capabilities of the model are checked by a number of simulations for one 
layer of elements subject to overall plane strain conditions, compared to 
plane strain predictions. Comparisons are made with results using other 
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orders of interpolation and other quadrature rules. It is shown that the 
high order 3D elements give a good representation of shear 
localization. For a uniaxial tensile test specimen with a square 
cross-section, full three-dimensional computations are carried out with 
meshes consisting of many brick elements in each coordinate direction, and 
these analyses are used to study the final failure mode in the neck region. 
The scalability of the parallel implementation is verified and the 
performance with the porous plastic constitutive relation is compared with 
that obtained using a standard isotropic hardening model. (Author abstract) 
36 Refs. 
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Title: Distributed, parallel, interactive volume rendering package 

Author: Rowlan, John S.; Lent, G. Edward; Gokhale, Nihar; Bradshaw, 
Shannon 

Corporate Source: Argonne Natl Lab, Argonne, IL, USA 

Conference Title: Proceedings of the 1994 IEEE Visualization Conference 
Conference Location: Washington, DC, USA Conference Date: 
19941017-19941021 

Sponsor: IEEE; ACM; SIGGRAPH 
E.I. Conference No.: 42510 

Source: Proceedings Visualization 1994. IEEE, Los Alamitos, CA, 
USA, 94CH35707 . p 21-30 
Publication Year: 1994 
CODEN: 001061 ISSN: 1070-2385 
Language: English 

Document Type: CA; (Conference Article) Treatment: A; (Applications); T 
; (Theoretical) 

Journal Announcement: 9504W4 

Abstract: This paper presents a parallel ray-casting volume rendering 
algorithm and its implementation on the massively parallel IBM SP-1 
computer using the Chameleon message passing library. Though this algorithm 
takes advantage of many of the unique features of the SP-1 (e.g. high-speed 
switch, large memory per node , high-speed disk array, HIPPI display, et 
al), the use of Chameleon allows the code to be executed on any collection 
of workstations. The algorithm is image-ordered and distributes the data 
and the computational load to individual processors. After the volume 
data is distributed, all processors then perform local raytracing of their 
respective subvolumes concurrently. No interprocess communication takes 
place during the ray tracing process. After a subimage is generated by each 
processor, the final image is obtained by composing subimages between all 
the processors. The program itself is implemented as an interactive process 
through a GUI residing on a graphics workstation which is coupled to the 
parallel rendering algorithm via sockets. The paper highlights the 
Chameleon implementation, the GUI, some optimization improvements, static 
load balancing, and direct parallel display to a HIPPI framebuffer. 
(Author abstract) 11 Refs. 
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Corporate Source: Univ of Durham, Durham, Engl 

Conference Title: Proceedings of the 7th Mediterranean Electrotechnical 
Conference - MELECON. Part 3 (of 3) 

Conference Location: Antalya, TURKEY Conference Date: 19940412-19940414 

Sponsor: IEEE; Middle East Technical University; Bilkent University; 
Chamber of Electrical Engineers of Turkey 

E.I. Conference No.: 42119 

Source: Mediterranean Electrotechnical Conference - MELECON 3 {of 3 1994. 
IEEE, Piscataway, NJ, USA, 94CH3388-6 . p 980-983 
Publication Year: 1994 
CODEN: 001676 
Language: English 

Document Type: CA; (Conference Article) Treatment: A; (Applications) 
Journal Announcement: 9503W3 

Abstract : Parallel simulation of power systems requires the system to be 
partitioned into subnetworks which are processed on individual processors. 
Maximum computational efficiency is achieved when the network is split such 
that each processors has an equal computational load . This paper proposes 
an automatic method of network partitioning which gives well balanced 
network splits, based upon an analysis of the factorisation tree for the 
system. The method also predicts the expected parallel speed-up for the 
split and allows the visualisation of large networks. A modified Minimum 
Degree Minimum Length node ordering algorithm is also presented which 
gives well balanced factorisation trees. (Author abstract) 13 Refs. 
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Title: Multimedia performance behavior of the GigaView parallel image 
server 

Author: Gennart, Benoit A.; Hersch, Roger D. 

Corporate Source: Ecole Polytechnique Federale' de Lausanne, Lausanne, 
Switz 

Conference Title: Proceedings of the 1994 13th Symposium on Mass Storage 
Systems 

Conference Location: Annecy, Fr Conference Date: 1994 0612-19940616 
Sponsor: IEEE 

E.I. Conference No.: 21462 

Source: Digest of Papers - IEEE Symposium on Mass Storage Systems 1994. 
IEEE, Piscataway, NJ, USA, 94CH3457-9 . p 90-98 
Publication Year: 1994 
CODEN: DPISDX ISSN: 1051-9173 
Language : English 

Document Type: CA; (Conference Article) Treatment: G; (General Review) 
Journal Announcement: 9502W4 

Abstract: Multimedia interfaces increase the need for large image 
databases, supporting the capability of storing and fetching streams of 
data with strict synchronicity and isochronicity requirements. In order to 
fulfill these requirements, the GigaView parallel image server 
architecture relies on arrays of intelligent disk nodes , with each disk 
node being composed of one processor and one disk. This paper analyzes, 
through simulation, the real-time behavior of the GigaView in terms of 
delay and delay jitter. For a high-end GigaView architecture, consisting of 
16 disks and T9000 transputers, we evaluate stream frame access times under 
various parameters, such as load factors, frame size, stream throughput 
, and synchronicity requirements. (Author abstract) 8 Refs. 
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Title: Distributed sparse Gaussian elimination and orthogonal 
factorization 

Author: Raghavan, Padma 

Corporate Source: Univ of Illinois, Urbana, IL, USA 

Conference Title: Proceedings of the Scalable High-Perf ormance Computing 
Conference 

Conference Location: Knoxville, TN, USA Conference Date: 
19940523-19940525 

Sponsor: IEEE Computer Society 
E.I. Conference No.: 21190 

Source: Proceedings of the Scalable High-Perf ormance Computing Conference 
1994. IEEE, Los Alamitos, CA, USA. p 607-614 
Publication Year: 1994 
CODEN: 850ZA6 
Language: English 

Document Type: CA; (Conference Article) Treatment: G; (General Review); 
T; (Theoretical) 

Journal Announcement: 9501W1 

Abstract: We consider the solution of a linear system Ax equals b on a 
distributed memory machine when the matrix A has full rank and is large, 
sparse and nonsymmetric. We use our parallel Cartesian Nested Dissection 
algorithm to compute a fill-reducing ordering of A using a compact 
representation of the column intersection graph . We develop and 
implement simple algorithms that use the resulting separator tree to 
estimate the structure of the factor and to distribute data and perform 
multifrontal numeric computations. When the matrix is nonsymmetric but 
square, the numeric computations involve Gaussian elimination with row 
pivoting; when the matrix is overdetermined, row-oriented Householder 
transforms are applied to compute the triangular factor of an orthogonal 
factorization. Our main contribution is the formulation of a fully 
parallel, unified approach to solving nonsymmetric sparse systems using 
either Gaussian elimination or orthogonal factorization and empirical 
results to demonstrate that the approach is effective both in reducing fill 
and achieving good parallel performance on an Intel iPSC/860. (Author 
abstract) 20 Refs. 
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Corporate Source: Australian Natl Univ, Canberra, Aust 

Conference Title: Proceedings of the Scalable High-Perf ormance Computing 
Conference 

Conference Location: Knoxville, TN, USA Conference Date: 
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Sponsor: IEEE Computer Society 
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Journal Announcement: 9501W1 

Abstract: A scheme for the visualization of large data volumes using 
volume rendering on a distributed memory MIMD system is described. The data 
to be rendered is decomposed into subvolumes to reside in the local 
memories of the system's nodes . A partial image of the local data is 
generated at each node by ray tracing, and is then composited with 
partial images on other nodes in the correct order to generate the 
complete image. Subvolumes whose voxels are classified as being mapped to 
zero opacity are not rendered, giving rise to an imbalance of work amongst 
nodes . Scattered decomposition is used for load balancing, which on one 
hand, creates additional overheads in compositing and communication, but on 
the other, provides an improvement in throughput that is dependent on the 
characteristics of the data. Experimental results for a typical data set 
rendered on a 1024-node Fujitsu AP1000 are reported. (Author abstract) 13 
Refs . 
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Corporate Source: Los Alamos Natl Lab, Los Alamos, NM, USA 
Conference Title: Proceedings of the 1994 Optical Fiber Communication 
Conference 

Conference Location: San Jose, CA, USA Conference Date: 
19940220-19940225 

Sponsor: Lasers and Electro-Optics Society of the IEEE; Optical Society 
of America; Communications Society of the IEEE 
E.I. Conference No.: 20322 

Source: Conference on Optical Fiber Communication, Technical Digest 
Series v 4 1994. Publ by IEEE, IEEE Service Center, Piscataway, NJ, USA. p 
64-65 

Publication Year: 1994 

CODEN: COFCEL ISBN: 1-55752-330-4 

Language: English 

Document Type: CA; (Conference Article) Treatment: A; (Applications); G 
; (General Review); T; (Theoretical) 
Journal Announcement: 9411W3 

Abstract: The Department of Energy High Performance Computing Research 
Center (HPCRC) at Los Alamos National Laboratory, one of two DOE-sponsored 
centers, employs gigabit-per-second networks to interconnect high- 
performance computing systems, storage systems, and visualization 
systems to create an integrated computational environment serving the needs 
of Grand Challenge-scale scientific applications. A diagram of this 
computational environment is shown in Fig. 1. The primary computational 
resource , a Thinking Machines Corp. 1024-node CM- 5 with 32 Gbyte of 
memory and four 800-Mbit/s high-performance parallel-interface (HIPPI) 
channels is used by several applications to model complex physical 
processes. One of these applications, a state-of-the-art global ocean 
model, generates several hundred gigabytes of data during a long 
simulation. Global ocean models must run for several decades of simulated 
time because of the physical properties of the ocean. These calculations 
typically require hundreds of hours on the world's fastest supercomputers 
and consequently do not generate data at gigabit-per-second rates. The 
results of these calculations is a multigigabyte file containing the time- 
and space-dependent values of various physical properties, such as 
temperature and velocity. These results are typically visualized on a 
high-resolution frame buffer driven at gigabit-per-second rates. A fast 
disk system is used to store the data and to stream it to the frame buffer. 
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Moving data from mass storage to a frame buffer at near gigabit-per-second 
rates motivated the design of a new file-system architecture. This 
architecture eliminates the traditional mainframe bottleneck between the 
disk storage devices and the network by attaching the storage devices 
directly to the network. We have achieved rates of 60 Mbit/s from a RAID 
disk array to a frame buffer system attached to a HIPPI-based network. 
Gigabit-per-second networks are permitting new approaches to file system 
architectures and visualization systems. These high-performance file 
systems and visualization systems, coupled with supercomputers, provide 
powerful tools in the quest for solutions to Grand Challenge-scale 
problems. (Author abstract) 
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Sponsor: IEEE Computer Society : 
E.I. Conference No.: 19912 
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Language: English 

Document Type: CA; (Conference Article) Treatment: G; (General Review); 
T; (Theoretical) 

Journal Announcement: 9404W1 

Abstract: This paper presents efficient parallel (hypercube and 
EREW-PRAM) algorithms for building pointer-based and linear quadtrees from 
boundary/chain code image representation . For the input boundary code 
of length 0(b) and the height 0(h) of the output quadtree, our EREW-PRAM 
algorithm takes 0(h plus logb) time and 0(b) processors for quadtree 
building from boundary code; this improves upon a previously published 
CREW- PRAM algorithm requiring O(h*logb) time and" 0(b) processors. For the 
same task, our hypercube algorithm takes O(h*logb) time and 0(b) 
processors; which also improves upon a previously published hypercube 
algorithm requiring 0(logb(h plus log**21ogb) ) time and 0(b) processors. 
The algorithms, presented here, use a direct and simple sibling finding 
technique for quadtrees; our technique exploits regularity in quadtree data 
structure, and it is applicable to any k-ary tree for which some 
(arbitrary) ordering exists among child nodes of a parent node . (Author 
abstract) 17 Refs. 
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Publication Year: 1993 

CODEN: CPEXEI ISSN: 1040-3108 

Language: English 

Document Type: JA; (Journal Article) Treatment: T; (Theoretical); A; 
(Applications) 

Journal Announcement: 9310W2 

Abstract: A solution is proposed to the problem of interactive 
visualization and rendering of volume data. Designed for parallel 
distributed memory (MIMD) architectures, the volume rendering system is 
based on the ray tracing (RT) visualization technique, the Sticks 
representation scheme (a data structure exploiting data coherence for the 
compression of classified data sets), the use of a slice-partitioning 
technique for the distribution of the data between the processing nodes 
and the consequent ray-data-flow parallelizing strategy. The system has 
been implemented on two different architectures: an inmos Transputer 
network and a hypercube nCUBE 6400 architecture. The high number of 
processors of this latter machine is allowed us to exploit a second level 
of parallelism (parallelism on image space, or parallelism on pixels) in 
order to arrive at a higher degree of scalability. In both proposals, the 
similarities between the chosen data-partitioning strategy, the 
communications pattern of the visualization processes and the topology of 
the physical system architecture represent the key points and provide 
improved software design and efficiency. Moreover, the partitioning 
strategy used and the network interconnection topology reduce the 
communications overhead and allow for an efficient implementation of a 
static load -balancing technique based on the prerendering of a low 
resolution image. Details of the practical issues involved in the 
parallelization process of volumetric RT, commonly encountered problems 
(i.e. termination and deadlock prevention) and the sw migration process 
between different architectures are discussed. (Author abstract) 21 Refs. 
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Author: Maehle, Erik; Obeloer, Wolfgang 

Corporate Source: Universitat-GH-Paderborn, Paderborn, Germany 
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Publication Year: 1992 
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Language: English 

Document Type: JA; (Journal Article) Treatment: X; (Experimental); A; 
(Applications) 

Journal Announcement: 9305 

Abstract: Monitoring tools are important parts of future programming 
environments for parallel computers. In this paper the software monitor 
DELTA-T is presented which has been developed for performance monitoring 
of (standard) multi-transputer systems at the University of Paderborn. 
Instrumentation is implemented by 1 spy ' -processes which are inserted into 
the target system either to observe it at the node or at the process 
level. Measurement traces generated by these spies are buffered locally in 
the node memories. A global system view is achieved by time-stamping the 
recorded events with a globally valid system time. Evaluation is carried 
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out offline on a host workstation either with an animation tool or an 
interactive graphical visualization tool. (Author abstract) 14 Refs. 
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Conference Title: Proceedings of the IEEE SOUTHEASTCON 1 92 
Conference Location: Birmingham, AL, USA Conference Date: 19920412 
Sponsor: IEEE Alabama Section; IEEE Region 3 
E.I. Conference No.: 17598 

Source: Conference Proceedings - IEEE SOUTHEASTCON v 2. Publ by IEEE, 
IEEE Service Center, Piscataway, NJ, USA (IEEE cat n 92CH3094-0) . p 724-729 
Publication Year: 1992 
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Document Type: PA; (Conference Paper) Treatment: T; (Theoretical); A; 
(Applications) 
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Abstract: A hybrid performance monitor developed for MSPARC, a 
mesh-connected, message-passing multicomputer, is described. The 
development of the hybrid performance monitor was a cross-disciplinary 
enterprise requiring custom hardware and a range of software support 
including monitor code, driver interfaces, probe history acquisition and 
processing, graphical display, and application probe injection. 
Programmable hardware was designed to unobtrusively collect events on each 
node and maintain their accurate chronological order. This distributed 
collection system was coupled by its independent network to a central 
monitor where data selection and presentation techniques played an 
important role in the visualization of the parallel system's execution. 
13 Refs. 



11/7/29 (Item 22 from file: 8) 

DIALOG (R) File 8 : Ei Compendex(R) 

(c) 2000 Engineering Info. Inc. All rts. reserv. 

03481275 E.I. Monthly No: EI9209110643 
Title: Using visualization tools to understand concurrency. 
Author: Zernik, Dror; Snir, Marc; Malki, Dalia 
Corporate Source: Dept of Electr Eng, Technion, Haifa, Israel 
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Document Type: JA; (Journal Article) Treatment: A; (Applications); T; 
(Theoretical); X; (Experimental) 
Journal Announcement: 9209 

Abstract: A visualization tool that provides an aggregate view of 
execution through a graph of events called the causality graph, which is 
suitable for systems with hundreds or thousands of processors, 
coarse-grained parallelism, and for a language that makes communication and 
synchronization explicit, is discussed. The methods for computing causality 
graphs and stepping through an execution with causality graphs are 
described. The properties of the abstraction algorithms and super nodes , 
the subgraphs in causality graphs, are also discussed. 4 Refs. 
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Publication Year: 1989 

CODEN: NNETEB ISSN: 0893-6080 

Language: English 

Document Type: JA; (Journal Article) Treatment: T; (Theoretical); X; 
(Experimental ) 

Journal Announcement: 9011 

Abstract: The recurrent back-propagation algorithm for neural networks 
has been implemented on the Connection Machine, a massively parallel 
processor . Two fundamentally different graph architectures underlying the 
nets were tested: one based on arcs, the other on nodes . Confirming the 
predominance of communication over computation, performance measurements 
underscore the necessity to make connections the basic unit of 
representation . Comparisons between these graph algorithms lead to 
important conclusions concerning the parallel implementation of neural nets 
in both software and hardware. (Author abstract) 16 Refs. 
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Abstract: A microcomputer software package that provides a computer 
graphics simulation of electrical power systems in a parallel -processing 

configuration is described. It facilitates a user-friendly environment to 
configure the electrical network so that hypothetical or actual substations 
and transmission lines can be added or deleted to simulate any change in 
the network ! s performance . The package runs in parallel with the load 
-flow program and is used as a computer-aided design tool to provide a 
graphical representation of the Lebanese electrical network, which is 
composed of 127 nodes . The choice and implementation of the hierarchical 
data structure used to store the network's model are examined. 
Graphic-oriented reports, context-sensitive HELP, and online documentation 
make the package a powerful tool that can analyze anything from a few buses 
to an entire network. An important feature of the package is its fast 
throughput of execution. 4 Refs. 
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Abstract: A broadly applicable approach to mapping of parallel 
computations on multiprocessors is described, and the related mapping 
algorithms are briefly sketched. The approach begins with a graph 
representation of a parallel computation and first generates a reduced 
graph by merging nodes with high internode communication cost' through 
iterative use of a critical-path algorithm. This graph is then mapped to a 
graphical representation of a multiprocessor architecture by the mapping 
algorithms. These algorithms attempt to minimize the total execution 
time , including both computation and communication times. The algorithms, 
while they are heuristic rather than true optimal algorithms, are shown to 
yield excellent results in example applications and have modest execution 
costs. 23 Refs. 
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Exploiting parallelism is an essential part of maximizing the 
performance of an application on a parallel computer. Parallelism is 
traditionally exploited at two granularities: individual operations are 
executed in parallel within a processor to exploit instruction-level 
parallelism and loop iterations or processes are executed in parallel on 
different processors to exploit loop-level parallelism and process-level 
parallelism. 

A new generation of architectures that execute multiple instruction 
streams on a single chip has the potential of significantly reducing the 
gap between communication costs within a processor and between processors. 
This means that parallelism of multiple granularities can be exploited 
between instruction streams by overlapping regions of code that range in 
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granularity from a small set of instructions to basic blocks, conditionals, 
loop iterations, loop nests, procedure calls, and collections of such 
constructs. This opens the way to exploiting more parallelism in a larger 
number of applications than has been feasible in the past. Furthermore, it 
creates a demand for compilation techniques which exploit multi-grained 
parallelism, that is, the overlap of program regions of different 
granularities . 

This thesis studies the exploitation of multi-grained parallelism. It 
presents a program representation called the program dependence graph 
(PDG) and a node labeling scheme that supplements it. These 
representations have been specialized to expose multi-grained parallelism 
and facilitate its exploitation on a multiple-instruction-stream 
architecture. The thesis investigates novel compilation techniques for 
exploiting multi-grained parallelism and explores the impact of 
synchronization cost on performance . These techniques perform 
partitioning, scheduling and synchronization of a single application for a 
multiple-instruction-stream architecture. The partitioning techniques make 
global trade-offs to select the granularity of parallelism to exploit in 
each part of the program so as to minimize the overall latency for a target 
architecture. The thesis describes an implementation of these 
representations and techniques called Pedigree, which is the first 
post-pass, retargetable compiler to target multiple-instruction-stream 
architectures. The SDIO and some SPEC benchmarks have been compiled by 
Pedigree and used to demonstrate its ability to parallelize code. The best 
results for exploiting multi-grained parallelism come from overlapping 
parallelized loop nests, something which is new to this work. 
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This dissertation identifies a class of parallel polygon rendering 
algorithms suitable for interactive use on multicomputers , and presents a 
methodology for designing efficient algorithms within that class. The 
methodology was used to design a new polygon rendering algorithm that uses 
the frame-to-frame coherence of the screen image to evenly partition the 
rasterization at reasonable cost. An implementation of the algorithm on the 
Intel Touchstone Delta at Caltech, the largest multicomputer at the time, 
renders 3.1 million triangles per second. The rate was measured using a 
306,640 triangle model and 512 i860 processors, and includes back- facing 
triangles . A similar algorithm is used in Pixel-Planes 5, a system that 
has specialized rasterization processors, and which, when introduced, had a 
benchmark score for the SPEC Graphics Performance Characterization Group 
"head" benchmark that was nearly four times faster than commercial 
workstations. The algorithm design methodology also identified significant 
performance improvements for Pixel-Planes 5. 

All fully parallel polygon rendering algorithms have a sorting step to 
redistribute primitives or fragments according to their screen location. 
The algorithm class mentioned above is one of four classes of parallel 
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rendering algorithms identified; the classes are differentiated by the type 
of data that is communicated between processors. The identified algorithm 
class, called sort-middle, sorts screen-space primitives between the 
transformation and rasterization. 

The design methodology uses simulations and performance models to 
help make the design decisions. The resulting algorithm partitions the 
screen during rasterization into adaptively sized regions with an average 
of four regions per processor. The region boundaries are only changed when 
necessary: when one region is the rasterization bottleneck . On smaller 
systems, the algorithm balances the loads by assigning regions to 
processors once per frame, using the assignments made during one frame in 
the next. However, when 128 or more processors are used at high frame 
rates, the load balancing may take too long, and so static load 
balancing should be used. Additionally, a new all-to-all communication 
method improves the algorithm's performance on systems with more than 64 
processors . 
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There is a broad consensus that major discoveries in key applications, 
the inventory of which has been made of in the report "Grand Challenges 
1993: High Performance Computing and Communications" by the Committee on 
Physical, Mathematical and Engineering Sciences, would be within reach if 
computers 1000 times faster than today's conventional super computers 
exist, assuming equal progress in algorithms, software to exploit that 
computing power and visualization techniques to represent the results of 
the computations . 

Computers containing large numbers of processing nodes are required 
to study these "Grand Engineering Challenges". Such teraflop machines will 
be massively parallel, involving thousands of coupled nodes , solving 
problems containing trillions of data points. However large numbers of 
nodes linked by conventional busses suffer from communication congestions 
caused by bus contentions, which is known as the von Neumann bottleneck , 
as the bus must sequentialise many parallel data exchanges. 

Optical Processing, among which opto to electronic conversion, strives 
after as much as parallelism in storing and manipulating optical ' 
information. A new class of optically writable and electrically readable 
logic elements was introduced. Arrays of such elements, processed in 
bipolar as well as in CMOS, were realized with the IC group of the Faculty 
of Electrical Engineering. 

Optical Networking applied in such massively parallel computers not 
only implicates the choice of a suitable transport medium but rather aims 
at an optimum exploration and exploitation of its inherent parallelism. The 
investigation of optics inherent parallelism, involving billions of 
trajectories, is an important subject for study in the final years of the 
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twentieth century and beyond. It is foreseen that novel routing techniques 
can improve on performance of massively parallel processors when 
linking crates, nodes , chips or gates optically. The necessity of 
utilizing optical interconnects becomes crucial when large numbers of 
computing nodes are involved. 

An optical free space data distributing system, enabling simultaneous 
communication between nine nodes , each of them producing 64 bits of 
information, was developed with the Institute of Applied Physics TPD/TNO. 

The objective is to realize an Opto Electronic Processing & Networking 
system prototype, suitable to be implemented in a fully connected multiple 
instruction, multiple data stream (MIMD) architecture, containing 1024 or 
more computing nodes . 
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Ranked data arise when some group of judges is asked to rank order a 
set of n items according to some preference function. A judge's ranking is 
denoted by a vector x = $ (x\sbl , . . . , x\sb { n } ) , $ where $x\sb{i}$ is the rank 
assigned to item i. If we treat these vectors as points in $\Re\sp{n}$, we 
are led to consider the geometric structure encompassing the collection of 
all such vectors: the convex hull of the n! points in $\Re\sp{n}$ whose 
coordinates are permutations of the first n integers. These structures are 
known as permutation polytopes. 

The use of such structures for the analysis of ranked data was first 
proposed by Schulman $ \lbrack65\rbrack$ . Geometric constraints on the 
shapes of the permutation polytopes were later noted by McCullagh 
$\lbrack56\rbrack . $ Thompson $\lbrack77\rbrack$ advocated using the 
permutation polytopes as outlines for high-dimensional "histograms", and 
generalized the class of polytopes to deal with partial rankings (ties 
allowed) . 

Graphical representation of ranked data can be achieved by putting 
varying masses at the vertices of the generalized permutation polytopes. 
Each face of the permutation polytope has a specific interpretation; for 
example, item i being ranked first. The estimation of structure in ranked 
data can thus be transformed into geometric (visual) problems, such as the 
location of faces with the highest concentrations of mass. 

This thesis addresses various problems in the context of such a 
geometric framework: the automation of graphical displays of the 
permutation polytopes; illustration and estimation of parametric models; 
and smoothing methods using duality—where every face is replaced with a 
point. A new way of viewing the permutation polytopes as projections of 
high-dimensional hypercubes is also given. The hypercubes are built as 
cartesian products of the $ (\sbsp{ 2 } {n} ) $ possible paired comparisons, and 
as such lead to methods for building rankings from collections of paired 
comparisons . 
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The objective of the research reported in this thesis is to develop a 
set of techniques to automatically find rate optimal or near rate optimal 
implementations in parallel /pipelined processing environments for DSP 
algorithms that are represented by recursive shift-invariant flow graphs. 
The parallel /pipelined processing environments are synchronous 
parallel processing systems that consist of one or more processors where 
each processor could be internally pipelined. The shift-invariant flow 
graph is a graphical representation which describes the computational 
structures of broad class of DSP algorithms at fine-grain level. Since the 
node execution times in defining flow graphs are deterministic, this 
research addresses compile time scheduling. 

The research in this thesis can be divided into three areas. First , an 
instruction scheduling methodology for a single pipelined processor is 
presented. In such case, the problem to be addressed is the scheduling of a 
single instruction stream which controls all of the pipeline stages. The 
goal of an automatic scheduler in this context is to rearrange the order of 
instructions such that they are executed with minimum time and no 
pipeline faults. In other words, sequences of instructions are ordered to 
minimize the iteration period between successive iteration of defining flow 
graphs . 

Second, a new class of multiprocessor system, called Clock-Skewed 
Parallel Processing system, is proposed. This system provides an elegant 
solution to interprocessor communication problems multiprocessor system. 
The interprocessor communication strategy described in this system is a 
combination of a synchronous multiprocessor architecture, an associated 
interprocessor communication architecture, and a multiprocessor compiler 
which considers the interprocessor communication to be a scheduling 
constraint. This system not only can handle the interprocessor 
communications very efficiently but also can explicitly incorporate the 
interprocessor communication time delay into the parallel scheduling model. 

Third, an instruction scheduling methodology for a multiple pipelined 
processing system is presented. In this system, since more than one 
pipelined processor is involved in parallel processing , all the 
processors must be interconnected in some manner. In such processing 
environments, the interprocessor communications joins the instruction 
scheduling as a major problem. This research presents a system scheduler 
which combines the instruction scheduling methodology for a single 
pipelined processor and the interprocessor communication strategy in the 
clock-skewed parallel processing system. This system has a simple 
interprocessor communication structure which can provide good performance 
and which results in scheduling constraints that can be reasonably 
integrated into the searching algorithms of an optimal compiler. 

11/7/38 (Item 6 from file: 35) 

DIALOG (R) File 35 : DISSERTATION ABSTRACTS ONLINE 
(c) 2000 UMI . All rts. reserv. 

01164804 ORDER NO: AAD91-20229 

TECHNIQUES FOR PARALLEL GEOMETRIC COMPUTATIONS (PARALLEL ALGORITHMS) 



25 May 10, 2000 10:56 



Ginger Roberts - Search Report 



Author: KAN KAN HALL I , MOHAN S. 
Degree: PH.D. 
Year: 1990 

Corporate Source/Institution: RENSSELAER POLYTECHNIC INSTITUTE (0185) 
Adviser: WM. RANDOLPH FRANKLIN 

Source: VOLUME 52/02-B OF DISSERTATION ABSTRACTS INTERNATIONAL. 
PAGE 932. 17 6 PAGES 

Parallel Computing is one solution for efficient processing of the 
large geometric databases encountered nowadays. This thesis presents the 
Uniform Grid and Vertex Neighborhood techniques for performing geometric 
operations in parallel. These techniques have several desirable properties. 
Their average execution time rises linearly with the sum of the input 
and output. They require little global information, which reduces the 
interprocessor communications cost. They exploit features of modern 
machines, such as a large flat address space, to avoid a log factor in 
times. The algorithms have simple data structures leading to ease of 
implementation. This research shows the broad applicability of these 
techniques by developing solutions for geometric problems in diverse 
application domains. These techniques have been used to develop efficient 
algorithms for Visible Surface Determination in Visualization and 
Iso-rectangle problems in VLSI. The Parallel Object-Space Visible Surface 
Determination algorithm has an expected time complexity of $0({n\over p}$ + 
$k{\rm log}\sb2k)$ where $n$ is the number of input edges and $k$ the 
number of visible segments assuming a CREW PRAM model of computation with 
$p$ processors. The implementation on a shared-memory Sequent Balance 21000 
shows an average speedup of 10 using 15 processors . The parallel 
algorithm for computing the area of the union of a set of iso-rectangles 
has an expected time complexity of $0({n+k\over p}$ + log$\sb2p$) for a 
data set with $n$ edges and $k$ intersections on a $p$-processor machine. 
The Connection Machine implementation of the algorithm also exhibits good 
performance . This demonstrates the practical use of the techniques on 
different parallel architecture paradigms. 
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Multipurpose batch plants are often designed to concurrently produce a 
number of high-value products using a common collection of batch and 
semicontinuous processes. Because of their increased use in a number of 
emerging, small scale chemical industries, the design and planning of batch 
plants are receiving increased attention. 

The objective of this work is to develop a general model for the 
optimization and control of complex multipurpose batch plants. 

The system is modelled as a discrete event system by using Minimax 
Algebra which provides a framework to write linear equations that describe 
the performance of the plant. The state variables in these equations 
quantify the time at which designated events occur. 

A multipurpose batch plant is viewed as a network of activities. The 
sequencing graph of the system helps determining an effective set of 
variables that denote the nodes and the arcs of the graph . The 
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advantage of this representation is that the resulting model is very 
general and can describe many process features such as storage, equipment 
setup times, parallel processing at the same stage, and alternative 
production routes. The model of the system is formulated as a mixed-integer 
linear program. Binary variables denote the arcs of the graph and represent 
the processing sequence in each unit. Continuous variables denote the 
nodes of the graphs and represent the starting times of the activities. 

The Model Predictive Control (MPC) of multipurpose batch plants is 
also discussed. MPC uses an explicit and separately identifiable model and 
optimizes an open-loop system to implement closed-loop control. The on-line 
calculation involves the solution of a mixed-integer linear program. The 
computed control consists of selecting the sequence, timing, and processing 
paths required to bring the system from a measured state to a final state. 
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We present the techniques of adaptive blocking and incremental 
condition estimation which we believe to be useful for the computation of 
common matrix decompositions in high-performance environments. We apply 
these new techniques to algorithms for computing the Householder QR 
factorization with and without pivoting on a coarse-grained distributed 
system. For reasons of portability, we use a pipelined scheme on a ring of 
processors as the basis of our algorithms. 

To take advantage of possible floating point hardware on each node 
we develop a blocked version of the pipelined Householder QR algorithm that 
employs the compact WY representation for products of Householder 
matrices . While a strategy involving blocks of fixed width leads to 
increased floating point utilization per node , it also leads to increased 
load imbalance. To reconcile this tradeoff we introduce a variable width 
blocking strategy based on a model of the critical path of the algorithm. 
The resulting adaptive blocking strategy provides for good floating point 
performance per node while* maintaining overall load balance. 
Experimental results on the Intel iPSC hypercube show that the adaptive 
blocking strategy performs indeed better than any fixed width blocking 
strategy. 

In the second part of our thesis we develop methods for introducing 
pivoting into the distributed QR factorization algorithm. Incorporating the 
traditional column pivoting strategy in a straightforward manner introduces 
a global synchronization constraint which results in increased 
communication overhead. A strictly local pivoting scheme avoids the 
resulting loss in efficiency, but has to be monitored for reliability. To 
this end, we introduce an incremental condition estimator which allows us 
to update the estimate of the smallest singular value of an upper 
triangular matrix R as new columns are added to R. The update requires 
only $0(n)$ flops and the storage of $0(n)$ words between successive steps. 
Experiments indicate that the incremental condition estimator is reliable 
despite its small computational cost. Using the incremental condition 
estimator we are then able to guard against the selection of troublesome 
pivot columns in our local pivoting scheme at little extra cost. Simulation 
results show that the resulting algorithm is about as reliable as the 
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traditional QR factorization algorithm with column pivoting. 
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Two design procedures for parallel computers have been developed, 
which map computations onto processor arrays. The first is a procedure that 
programs a processor array for computing a given expression. It 
consists of the following steps: (1) determining the type of expressions 
that can be evaluated by a given processor array ; (2) setting the 
processors in the processor array to carry out a computation within its 
computation space. This mapping procedure has been demonstrated for 
mesh-connected processing networks. 

The second procedure is a contraction mapping procedure that derives a 
target processor array from a directed acyclic graph representation 
of a program. This procedure consists of the following' steps: (1) 
representing the given problem by a homogenous program graph; (2) 
partitioning the vertices of the graph into subsets such that all the 
vertices in the same subset will be executed by one processor; (3) 
characterizing the algebraic relations of delays between computations by a 
fundamental loop matrix; (4) establishing a linear function of delays as a 
performance metric and solving the delays that minimize the linear cost 
function by linear programming; (5) constructing a contracted graph from 
that program graph. The contracted graph delineates the target processor 
array that computes the given problem. This contraction mapping procedure 
is applied to a variety of problems, including algebraic computations and 
character string processing. 
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The search for computer architectures utilizing large numbers of 
processing elements and their application to suitable problems has been a 
continual quest for many researchers. This dissertation presents results of 
an analysis of a multiprocessor architecture applied to the problem of 
database management. 

The database problem is first re-stated in terms of a new data 
model, the Active Graph Model, which employs a graphical 

representation for data (nodes ) and relationships (arcs) in addition to 
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concepts from the dataflow model of computation to exploit the parallel 
processing power of the architecture. The nodes of the graph are 
1 active 1 elements which respond to requests in the form of tokens traveling 
along the arcs. This data model and its query language are shown to be 
relationally complete and therefore equivalent in expressive power to the 
Relational Model. 

A mesh-connected array of processing elements forms the basis 
for the architecture. The nodes and arcs of the model are mapped onto the 
architecture and practical algorithms are defined for distributing 
requests, data manipulation, and for sorting and reporting of results. The 
functionality of these algorithms is verified and the performance 
characteristics of the system are measured through an implementation of the 
algorithms on simulated hardware using a standardized evaluation 
methodology. 

The results of the experiments demonstrate that large numbers of 
processors can be used effectively given a sufficiently large problem. 
Additionally, under-utilized processing capability can be used by multiple 
simultaneous requests. Finally, the system is not plagued by interprocessor 
communications bottlenecks which have been identified in other such 
systems . 
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A Frame-Based Image Processing (FBIP) system is designed to provide a 
high throughput computation rate for the applications in image processing 
and pattern recognition. The detailed configuration of the processing 
element and the organization of the memory cell are described. A 
mathematical model is proposed to. describe both the hardware architecture 
and the parallelism of the software. Both the logical and arithmetical 
Frame-Based operations are accomplished on a pixel-by-pixel basis. 
According to the geometric properties of the memory cell connection, the 
parallelism of the local processing can be achieved through the 
successively shift and accumulate technique. Experimental results of the 
Frame-Based local convolution, the Frame-Based edge operation and the 
Frame-Based cellular operation are presented. 

A multi-stage vision system is described to understand the 
three-dimensional surface characteristics from multiple two-dimensional 
images. The structured lighting control technique associated with the 
orthographies projection model is applied such that the computation of the 
surface gradient can be accomplished with Frame-Based operations. Analyses 
are conducted for the estimation of surface orientation with various 
lighting arrangements. Recommendations for simple and general approaches 
are also provided. Furthermore, an element type structure is utilized by 
using the gradient image to describe the surface profile. The triangrular 
element, based on the equigradient contour, is chosen as the basic unit for 
surface representation. 
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Abstract: An important part of parallel programming is program partitioning 
and scheduling. Partitioning is the separation of program operations 
into sequential tasks, and scheduling is the assignment of tasks to the 
processors of a computer system. To be effective, automatic methods 
require an accurate representation of the model of computation and the 
target architecture. Current partitioning methods assume the 
macro-dataflow model of computation and the homogeneous/two-level 
architectural model. The former is typically represented as a directed, 
acyclic graph of computation nodes and communication edges. The edges 
map directly to communication channels, but not read/write memories. 
Consequently, current methods optimize assuming the presence of 
communication channels, and not the complex memory systems of NUMA 
architectures-they fail to optimize for a critical component of these 
architectures. In this paper, we extend the conventional graph 
representation of the macro-dataflow model to enable mapping 
heuristics to work with a NUMA architectural model. We describe two 
such heuristics. Simulated execution times of programs show that 
our model and heuristics generate higher quality program mappings than 
current methods. 
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Conference Title: COMPCON '91: 36th Institute of Electrical and Electronic 
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Publication Date: 1991 p 264-269 (601 p) 
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Language : In English 
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Abstract: Mathematical models for predicting the behavior of physical 
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phenomena by computer, as well as for other complex applications, are 
often restricted to two spatial dimensions to limit the computing 
resources required to analyze them. However, real world phenomena 
occur in a three-dimensional space. This paper describes a computer 
that has been built primarily to support both three-dimensional 
simulation of physical phenomena, as well as other applications that 
require three -dimensional models, and visualization of the 
results as volume data. This computer is a massively parallel SIMD 
machine whose processors are interconnected in a three-dimensional 
rectangular lattice, together with a controller that extends this 
lattice to a virtual lattice with many more nodes than there are 
processors. It is called the Data Transport Computer because its 
interprocessor communication structure is capable of moving large 
amounts of data in parallel not only among its processors, but also to 
and from 110 devices. 
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Dept. of Computer Science) 
Source: Journal of Parallel and Distributed Computing (USA) v 9:3. 
Coden: JPDCE ISSN: 0743-7315 
Publication Date: Jul 1990 p 282-296 

Contract Number (Non-DOE) : DCR84-05241 
Language: In English 

Abstract: The quadtree representation of matrices is a uniform 
representation for both sparse and dense matrices which can 
facilitate shared manipulation on multiprocessors. This paper presents 
worst-case and average-case resource requirements for storing and 
retrieving familiar families of patterned matrices: packed, symmetric, 
triangular , Toeplitz, and banded. Using this representation it 
compares resource requirements of three kinds of permutation 
matrices, as examples of nondense, unpatterned matrices. Exact values 
for the shuffle and bit-reversal permutations (as in the fast Fourier 
transform) and tight bounds on the expected values from purely random 
permutations are derived. Two different measures, {ital density} and 
{ital sparsity}, are proposed from these values. Analysis of quadtree 
matrix addition relates density of addends to space bounds on their sum 
and relates their sparsity to time bounds for computing that sum. 
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Availability: NTIS, PC A03/MF A01 - OSTI; GPO Dep. 1 
Abstract: The Multiple Crossbar Network (MCN) is a prototype High-Speed 
Local Network at the Los Alamos National Laboratory. It will 
interconnect supercomputers, network servers and workstations from 
various commercial vendors. The MCN can also serve as a backbone for 
message traffic between local area networks. The MCN is a switched 
local network of switching nodes called Cross -Point Stars (CPs) . 
Hosts and CPs are connected by 800-Mbit/s { 100-Mbyte/s ) point-to-point 
ANSI High-Speed Channels. CPs include RISC-based network protocol 
processors called Crossbar Interfaces and a switching core called the 
Crossbar Switch. Protocols include physical, data link, intranet, and 
network access functionality. Various internet and transport protocols 
are intended to run above the MCN protocol suite. A network management 
and simple naming service is also included within the Los Alamos 
Network Architecture. Immediate applications include visualization . 
The MCN is intended to also serve as a framework for multicomputer 
applications. 36 refs., 10 figs. 
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Abstract: We have developed four parallel graphics algorithms for 
visualization of complex problems in PDE simulations, radar 
simulation, and other large applications on a 1024-node ensemble with 
a 16-node graphics device. We discuss the impact of system parameters 
on algorithm development and performance . Algorithmic issues include 
multistage routing of graphics data through the ensemble, non- 
hypercube mappings from the ensemble to the graphics system, 
synchronization between ensemble and graphics nodes , and 
synchronization between graphic nodes . These issues apply to both the 
present and anticipated future systems which combine highly parallel 
ensembles and parallel I/O devices. 11 Best 11 solutions are described 
for routing, mapping and synchronization on the current hardware. 
Implications are discussed for future hardware and software for 
massively parallel computers. 6 refs., 2 figs., 2 tabs. 
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Publication Date: 2 Dec 1988 p 111 

Report Number(s): AD-A-2037 96/8/XAB; NRL-9167 

Language: English 

Availability: NTIS, PC AQ6/MF A01. 

Abstract: Pineda's Recurrent Back-Propagation algorithm for neural networks 
was implemented on the Connection Machine, a massively parallel 

processor . Two fundamentally different graph architectures underlying 
the nets were tested - one based on arcs, the other on nodes . 
Confirming the predominance of communication over computation, 

performance measurements underscore the necessity to make connections 
the basic unit of representation . Comparisons between these graphs 
algorithms lead to important conclusions concerning the parallel 
implementation of neural nets in both software and hardware. 
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Conference Title: Multiprocessors and array processors 
Conference Location: San Diego, CA, USA Conference Date: 3 Feb 1988 
Publisher: Society for Computer Simulation, San Diego, CA 
Publication Date: 1988 p 91-93 v 

Report Number(s): CONF-880295- 
Language : English 

Abstract: Image computing is a computationally intensive task, and in order 
to meet the increasing performance needs of sophisticated users, new 
hardware architectures must be developed along with advancing software 
algorithms. AT and T Pixel Machines has used a new, highly parallel 
architecture that incorporates both a pipeline of 9 to 18 processing 

nodes and a parallel array of 16 to 64 processing nodes in its new 
image computers, the PXM 900 Series. The PXM 900 provides up to 820 
MFLOPS of peak processing power for applications such as the rendering 
and animation of 3D graphics , data visualization , image 
processing, and a variety of scientific applications. Each processing 

node is based on a high speed, floating point, programmable processor. 
This programmability ensures that the hardware can adapt to new 
advances in software algorithms. The architecture is modular so that 
users can update to higher models as their performance and image 
memory needs increase. 
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Publisher: IEEE Service Center , Piscataway, NJ 

Publication Date: 1987 p 72-80 \ 

Report Number(s): CONF-8706138- 
Language: English 

Abstract: With the advent of VLSI, relatively large processing arrays may 
be realized in a single VLSI chip. Such regularly structured arrays 
take considerably less time to design and test, and fault-tolerance can 
easily be introduced into them. However, only a few computational 
algorithms which can effectively use such regular arrays have been 
developed so far. The authors present an approach to mapping arbitrary 
algorithms, expressed as programs in a data flow language, onto a 
regular array of data-driven processors implemented by a number of VLSI 
chips. Each chip contains a number of processors, interconnected by a 
set of regular paths, and connected to processors in other similar 
chips to form a large array. This array is thus tailored to perform a 
specific computational task, as an attached processor in a larger 
system. The data flow program is first translated into a graph 
representation , the data flow graph , which is then mapped onto a 
finite but (theoretically) unbounded array of identical processors . 
Each node in the graph represents an operation which can be performed 
by an individual processor in the array . Therefore, the mapping 
operation consists of assigning nodes in the graph to processors in 
the array , and defining the connections between the processors 
according to the arcs in the graph. The last step consists of 
partitioning the unbounded array into a number of segments, to account 
for the number of processors which fit in a single VLSI chip. 
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Title: Intelligent simulation environments 

Series/Collection Title: Simulation Series. Volume 17 Number 1 
Conference Title: Society for Computer Simulation (SCS) multiconf erence 
Conference Location: San Diego, CA, USA Conference Date: 23 Jan 198 6 
Publisher: Society for Computer Simulation, San Diego, CA 
Publication Date: 1986 p 3-8 
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Abstract: After general considerations on a parallel approach of graph 
theory problems, the authors present a specific problem: 1 1 To find 
hamiltonian circuits in a 3-vertex connected cubic graph 11 . This 
problem is similar to the 1 'Travelling Salesman Problem' 1 . They 
present a parallel Knowledge Representation of the graph . The 
parallel algorithm uses parallel Propagation of Cellular Automata. This 
Exhaustive approach is combined with a powerful Heuristic method. The 
result is a powerful polynomial algorithm which finds Hamiltonian 
circuits in complex graphs. Meanwhile, an open-problem is discussed 
and research of cases where this algorithm fails is reported. 
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Title: A modular massively parallel processor for volumetric 

visualisation processing 
Author(s): Krikelis, A. 

Author Affiliation: Aspex Microsyst . Ltd., Brunei Univ., Uxbridge, UK 
Conference Title: High Performance Computing for Computer Graphics and 
Visualisation. Proceedings of the International Workshop p. 101-24 
Editor(s): Chen, M.; Townsend, P.; Vince, J. A. 
Publisher: Springer-Verlag, Berlin, Germany 

Publication Date: 1996 Country of Publication: Germany xvi+287 pp. 
ISBN: 3 540 76016 4 Material Identity Number: XX95-01869 

Conference Title: Proceedings of International Workshop on High 
Performance Computing for Computer Graphics and Visualization 
Conference Sponsor: High Educ . Funding Council for Wales 
Conference Date: 3-4 July 1995 Conference Location: Swansea, UK 
Language: English Document Type: Conference Paper (PA) 
Treatment: Practical (P) 

Abstract: A Modular Massively Parallel Processor capable of achieving 
real-time/interactive performance for volumetric visualisation 

applications is presented in this paper. The processor comprises identical 
SIMD processing nodes , which can be configured through a Data Transfer 
Network to support SIMSIMD and MIMSIMD configurations, while supporting 
independent Data I/O of 80 Mbytes/sec per node . For volumetric 

visualisation computation the system operates in SIMSIMD configuration 
with voxel slices equally distributed to each node . Data formation and 
classification which are based on traditional image processing techniques 
are performed with data local to each node using the SIMD computational 
power of it, which is implemented using the Associative String Processor 
(ASP) . For data manipulation computation, each node accesses data from 
remote nodes (with all the nodes using similar access patterns) through 
the Data Transfer Network. Once access to remote nodes has been achieved 
the data are processed on the ASP on one (or a group of) slice (s) at a time 
for data viewing (i.e. shading and front-to-back composition). (27 Refs) 

Copyright 1997, IEE 
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Author Affiliation: Sch. of Comput . & Inf. Sci., Syracuse Univ., NY, USA 
Conference Title: Conference Proceedings of the 1995 International 
Conference on Supercomputing p. 289-98 
Publisher: ACM, New York, NY, USA 

Publication Date: 1995 Country of Publication: USA xii+448 pp. 
ISBN: 0 89791 728 6 Material Identity Number: XX95-01418 

U.S. Copyright Clearance Center Code: 0 89791 728 6/ 95/000 . $3 . 50 
Conference Title: Proceedings of 9th ACM International Conference on 
Supercomputing 

Conference Sponsor: ACM 

Conference Date: 3-7 July 1995 Conference Location: Barcelona, Spain 
Language: English Document Type: Conference Paper (PA) 
Treatment: Theoretical (T) 

Abstract: A large number of data-parallel applications can be represented 
as computational graphs from the perspective of parallel computing. The 
nodes of these graphs represent tasks that can be executed concurrently, 
while the edges represent the interactions between them. Further, the 
computational graphs derived from many applications are such that the 
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vertices correspond to multi-dimensional coordinates, and the interaction 
between computations is limited to vertices that are physically 
proximate. The authors show that graphs with these properties can be 
transformed into simple architecture-independent representations that 
encapsulate the locality in these graphs . This representation allows a 
fast mapping of the computational graph onto the underlying architecture at 
the time of execution . This is necessary for environments where 
available computational resources can be determined only at the time of 

execution or that change during execution. (32 Refs) 
Copyright 1997, IEE 
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Conference Title: 1996 IEEE International Conference on Acoustics, 
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Publisher: IEEE, New York, NY, USA 

Publication Date: 1996 Country of Publication: USA 6 vol. lvii+3588 
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ISBN: 0 7803 3192 3 Material Identity Number: XX96-02719 

U.S. Copyright Clearance Center Code: 0 7803 3192 3/96/$5.00 
Conference Title: 1996 IEEE International Conference on Acoustics, 
Speech, and Signal Processing Conference Proceedings 
Conference Sponsor: Signal Process. Soc. IEEE 

Conference Date: 7-10 May 1996 Conference Location: Atlanta, GA, USA 
Language: English Document Type: Conference Paper (PA) 
Treatment: Theoretical (T) 

Abstract: This paper presents an algorithm to directly perform 
morphological operations on images represented by quadtrees and produce the 
dilated/eroded images, also represented by quadtrees. As in many other 
algorithms that execute on the quadtree representation of an image , the 

execution time is proportional to the number of nodes in the 

quadtree, rather than to the number of pixels in the original image array. 
In our algorithm, only black nodes have to be processed for dilation, and 
only white nodes have to be processed for erosion. We also performed 
experiments to show that the execution time for binary images that can 
be effectively represented by quadtrees can be significantly reduced, 
compared to direct computation on the original image arrays. (8 Refs) 

Copyright 1997, IEE 
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Publisher: IEEE Comput . Soc . Press, Los Alamitos, CA, USA 
Publication Date: 1993 Country of Publication: USA xv+423 pp. 
ISBN: 0 8186 3940 7 

U.S. Copyright Clearance Center Code: 1070-2385/93/$3 . 00 
Conference Title: Proceedings Visualization '93 

Conference Sponsor: IEEE Comput. Soc. Tech. Committee on Comput. Graphics 
ACM/SIGGRAPH 

Conference Date: 25-29 Oct. 1993 Conference Location: San Jose, CA, 
USA 

Language: English Document Type: Conference Paper (PA) 
Treatment : Practical (P) 

Abstract: The issue of monitoring the execution of asynchronous, 
distributed algorithms on loosely-coupled parallel processor systems, 
is important for the purposes of (i) detecting inconsistencies and flaws in 
the algorithm, (ii) obtaining important performance parameters for the 1 . 
algorithm, and (iii) developing a conceptual understanding of the 
algorithm's behavior, for given input stimulus, through visualization . 
For a particular class of asynchronous distributed algorithms that may be 
characterized by independent and concurrent entities that execute 
asynchronously on multiple processors and interact with one another through 
explicit messages, the following reasoning applies. Information about the 
flow of messages and the activity of the processors may contribute 
significantly towards the conceptual understanding of the algorithm's 
behavior and the functional correctness of the implementation. The 
computation and subsequent display of important parameters, based upon the 
execution of the algorithm, is an important objective of DIVIDE. For 
instance, the mean and standard deviation values for the propagation delay 
of ATM cells between any two given Broadband-ISDN (BISDN) nodes in a 
simulation of BISDN network under stochastic input stimulus, as a function 
of time, are important clues to the degree of congestion in the 
Broadband-ISDN network. Although the execution of the algorithm typically 
generates high resolution data, often, a coarse-level visual 

representation of the data may be useful in facilitating the conceptual 
understanding of the behavior of the algorithm. DIVIDE permits a user to 
specify a resolution less than that of the data from the execution of the 
algorithm, which is then utilized to coalesce the data appropriately. Given 
that this process requires significant computational power, for efficiency, 
DIVIDE distributes the overall task of visual display into a number of user 
specified workstations that are configured as a loosely-coupled parallel 

processor . DIVIDE has been implemented on a heterogeneous network of SUN 
spare 1 + , spare 2, and 3/60 workstations and performance measurements 
indicate significant improvement over that of a uniprocessor-based visual 
display. (24 Ref s) 
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Conference Date: 5-7 Sept. 1994 Conference Location: Como, Italy 
Language: English Document Type: Conference Paper (PA) 
Treatment: Applications (A); Practical (P) 

Abstract: Professionals in various fields such as medical imaging, 
biology and civil engineering require rapid access to huge amounts of 
uncompressed pixmap image data. Multi-media interfaces further increase the 
need for large image databases. In order to fulfill these requirements, the 
GigaView parallel image server architecture relies on arrays of 
intelligent disk nodes , each disk node being composed of one processor 
and one disk. This contribution analyzes through simulation and 
experimentation the behavior of the GigaView under single and multiple 
requests, and compares it to the behavior of RAID servers. It evaluates 
image visualization window access times under various parameters such 

as load factors and the number of cooperating disk nodes . Under single 
request, the GigaView image server can be modeled as a single high- 
throughput low-latency secondary storage device. Under multiple requests, 
the notions of utilization and maximum sustainable throughput define 
accurately the behavior of the GigaView. (9 Refs) 
Copyright 1995, IEE 
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Abstract: Several domain partitioning algorithms have been proposed to 
effect load -balancing among processors in parallel finite element 
analysis. The recursive spectral bisection (RSB) algorithm has been shown 
to be effective. However, the bisection nature of the RSB results in 
partitions of an integer power of two, which is too restrictive for 
computing environments consisting of an arbitrary number of processors. The 
paper presents two recursive spectral partitioning algorithms, both of 
which generalize the RSB algorithm for an arbitrary number of partitions. 
These algorithms are based on a graph partitioning approach which includes 
spectral techniques and graph representation of finite element meshes. 
The 'algebraic connectivity vector 1 is introduced as a parameter to assess 
the quality of the partitioning results. Both node -based and 
element-based partitioning strategies are discussed. The spectral 
algorithms are also evaluated and compared for coarse-grained partitioning 
using different types of structures modelled by ID, 2D and 3D finite 
elements. (28 Refs) 
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Abstract: The authors demonstrate a discrete-event simulator for 
evaluating the performance of various routing algorithms (probabilistic 
and deterministic) in both multicomputer, and multistage parallel 
interconnection networks. This simulator can route packets in a mesh, a 
generalized hypercube , or a star graph. It can also realize connection 
requirements for a given permutation, on an omega network, an augmented 
data manipulator, or a multistage circuit-switched hypercube . Current 
multicomputer, and multiprocessor designs tend to become increasingly 
complex. Therefore, it is very difficult to analytically evaluate their 

performance , and simulator tools become very useful. The authors are 
motivated to design a flexible and efficient simulator of various routing 
schemes, on both multicomputer, and multiprocessor architectures. Such 
parallel systems usually incorporate a highly symmetric interconnection 
network. By exploiting this symmetry, they avoid an explicit 

representation (adjacency list, or matrix ) of the underlying network. 
Instead, the communication algorithm is 1 smart 1 to route messages along 
adjacent nodes . The implementation is also event-driven, which is faster 
and easier to parallelize. The available options are described and a 
glimpse provided of current results and future extensions of this tool. ( 
15 Refs) 
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Title: PARAM architecture, programming environment and applications 

Author ( s ) : Bhat kar , V . P . 

Author Affiliation: Centre for Dev. of Adv. Comput . , Poona Univ., Pune, 
India 

Conference Title: Applications of Transputers 3. Proceedings of the Third 
International Conference on Applications of Transputers p. 48-59 

Editor (s): Durrani, T.S.; Sandham, W.A.; Soraghan, J.J.; Forbes, S.M. 
Publisher: IOS, Amsterdam, Netherlands 

Publication Date: 1991 Country of Publication: Netherlands 821 pp. 
Conference Sponsor: UK SERC/DTI Initiative on the Eng. Appl . Transputers; 
IEEE; I EE; IOP; et al 

Conference Date: 28-30 Aug. 1991 Conference Location: Glasgow, UK 
Language: English Document Type: Conference Paper (PA) 
Treatment: Practical (P) 

Abstract: PARAM is a multi-user, reconf igurable, scalable, MIMD parallel 
computer with peak performance exceeding 1 GFLOPS, developed under the 
Indian parallel computing initiative. Some notable features of PARAM are 
coherent integration of high-performance vector and signal processing 
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nodes , orthogonal supervisory and control bus, innovative packaging of 
compute cluster, and high-bandwidth high-capacity parallel disk array 
support. PARAM is endowed with an advanced integrated parallel programming 
environment, PARAS. PARAS is aimed at a host /back-end hardware model and 
provides an environment that efficiently harnesses the power of parallel 

processing offered by distributed memory, message passing machines such 
as PARAM . The host resident part of PARAS provides an easy-to-use 
environment for developing parallel programs as well as interactive user 
interfaces for profiling and debugging. Functions on the back-end, include, 
file and process management, message communication between remote tasks, 
mutual exclusion of shared variables and support for process farming, 
scientific data visualisation , debugging and profiling. PARAS offers a 
high bandwidth parallel file system which uses block declustering of files 
across multiple I/O nodes and spindles to balance computation, 
communication and I/O. In addition, PARAS has offline tools to facilitate 
cross program development, algorithm prototyping and load balancing. (4 

Refs) 
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03863414 INSPEC Abstract Number: C91031278 
Title: Harnessing supercomputers for computational underwater acoustics 

Author(s): Schultz, M.H. 

Author Affiliation: Dept. of Comput . Sci., Yale Univ., New Haven, CT, USA 
Conference Title: Computational Acoustics. Proceedings of the 2nd IMACS 
Symposium p. 239-42 vol.1 

Editor (s): Lee, D.; Cakmak, A.; Vichnevetsky, R. 
Publisher: North-Holland, Amsterdam, Netherlands 

Publication Date: 1990 Country of Publication: Netherlands 3 vol. 
(x+276+x+322+x+343) pp. 
ISBN: 0 444 88723 7 

Conference Sponsor: IMACS; Office Naval Res.; Princeton Univ.; Naval 
Underwater Syst. Center 

Conference Date: 15-17 March 1989 Conference Location: Princeton, NJ, 
USA 

Language: English Document Type: Conference Paper (PA) 
Treatment: General, Review (G) 

Abstract: In the context of computational underwater acoustics, there are 
two distinct regimes: (1) the high end, which will be handled by massively 
parallel supercomputers. This is basically the 'numerical ocean basin 1 
which will allow one to study new acoustic phenomena and to validate 
approximate models; (2) the low end, which will be handled by parallel 
workstations. This is basically a vehicle for code development, 
computations using the inexpensive approximate models validate on the high 
end machine, and visualization of the results of the computations. Two 
emerging technologies can be used to tackle the problems: massively 
parallel computers and reduced instruction set computers (RISC) which will 
form the nodes of parallel machines. There are a large number of 
innovative parallel architectures being developed by a large number of 
companies. From the present point of view, probably the most interesting in 
the long run will be the mult icomputers essentially composed of computers 
with their own distinct private memories with some sort of high 
performance interconnect scheme. Such systems not only provide the 
potential for very large amounts of CPU power but are an excellent way of 
providing very large amounts of memory with a very large aggregate 
bandwidth. (3 Refs) 



11/7/62 (Item 10 from file: 2) 



40 May 10, 2000 10:56 



Ginger Roberts - Search Report 



DIALOG (R) File 2 : INSPEC 

(c) 2000 Institution of Electrical Engineers. All rts. reserv. 

03067813 INSPEC Abstract Number: C88012849 
Title: Distributed simulation using Petri nets 

Author(s): Tamer Ozsu, M. 

Author Affiliation: Dept. of Comput . Sci . , Alberta Univ., Edmonton, 
Alta. , Canada 

Conference Title: Proceedings of the 1987 Summer Computer Simulation 
Conference p. 3-8 

Editor(s): Chou, J.Q.B. 

Publisher: SCS, San Diego, CA, USA 

Publication Date: 1987 Country of Publication: USA xliv+1021 pp. 
ISBN: 0 911801 20 0 
Conference Sponsor: SCS 

Conference Date: 27-30 July 1987 Conference Location: Montreal, Que., 
Canada 

Language: English Document Type: Conference Paper (PA) 
Treatment: Practical (P) 

Abstract: One of the problems of traditional simulation techniques is the 
computational cost of running the simulation experiments. With the advances 
in distributed computing, distributed simulation has started to emerge as a 
viable alternative for reducing the computational time of simulations. The 
authors reports on a distributed simulation methodology that is based on 
the simulation of Petri nets, C. Petri (1968). It requires that each model 
be represented as a graph (or a net) which can consist of various subnets. 
Thus, the models are modular by definition, making them amenable for 
distributed simulation. The simulation methodology provides the primitives 
by which subnets in the model can 'communicate 1 with one another, thus 
facilitating the distribution of model subnets to various computing nodes 
(22 Refs) 
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(Final Technical Report, 1 Jan. - 31 Dec. 1995) 
Massachusetts Inst, of Tech., Cambridge. 
Corp. Source Codes: 001450000; MJ700802 

Sponsor: National Aeronautics and Space Administration, Washington, DC. 
Report No.: NAS 1.26:199399; NASA-CR-199399 
Oct 95 16p 
Languages: English 

Journal Announcement: GRAI9608; STAR3403 
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Country of Publication: United States 

Contract No.: NAG2-884 

The current computing environment that most researchers are using for the 
calculation of 3D unsteady Computational Fluid Dynamic (CFD) results is a 
super-computer class machine. The Massively Parallel Processors (MPP 
's) such as the 160 node IBM SP2 at NAS and clusters of workstations 
acting as a single MPP (like NAS 1 s SGI Power-Challenge array) provide the 
required computation bandwidth for CFD calculations of transient problems. 
Work is in progress on a set of software tools designed specifically to 
address visualizing 3D unsteady CFD results in these super-computer-like 
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environments. The visualization is concurrently executed with the CFD 
solver. The parallel version of Visual3, pV3 required splitting up the 
unsteady visualization task to allow execution across a network of 
workstation (s) and compute servers. In this computing model, the network is 
almost always the bottleneck so much of the effort involved techniques to 
reduce the size of the data transferred between machines. 
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STRIDE Towards Practical 3 - D Device Simulation — Numerical and 
Visualization Considerations. (Reannouncement with New Availability 
Information) 

Wu, K. C. ; Chin, G. R. ; Dutton, R. W. 

Stanford Univ., CA, Dept. of Electrical Engineering, 

Corp. Source Codes: 009225022; 400852 

Sponsor: Army Research Office, Research Triangle Park, NC . 
Report No.: ARO-282 97 . 4-EL 
Sep 91 lOp 

Languages: English Document Type: Journal article 
Journal Announcement: GRAI9604 

Pub. in IEEE Transactions on Computer-Aided Design, vlO n9 pll32-1140, 
Sep 91. Order this product from NTIS by: phone at 1-800-553-NTIS (U.S. 
customers); (703)605-6000 (other countries); fax at (703)321-8547; and 
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Springfield, VA, 22161, USA. 

NTIS Prices: PC A02/MF A01 

Country of Publication: United States 

Contract No.: DAAL03-91-G-0152 

A 3D device solver (STRIDE), capable of solving grids up to 250,000 
nodes , has been developed on a message-passing multiprocessor. By the use 
of iterative matrix solvers and Gummel style nonlinear iteration schemes, 
user memory per node is reduced over use of direct solvers and Newton 
schemes. By using an independent-edge-grouping scheme to increase the 
vector length to the order of the number of variables, the vector 

processing efficiency is significantly increased without additional 
floating point operations. We extend the modif ied-singular-perturbation 
(MSP) scheme to two-carrier simulations. This significantly speeds up the 
convergence rate of Gummel style nonlinear iterations. Physical insight 
gained from the MSP schemes also leads to an automatic switching scheme 
between various nonlinear schemes based on the monitoring of certain matrix 
parameters. This allows the incorporation of a previously proposed 
Newton-lC scheme which offers the best CPU performance for normal bipolar 
simulations. When combined with current convergence criterion, a set of MSP 
inspired convergence criterion are better able to recognize a practically 
converged solution. A novel global convergence scheme is also developed 
based on insight from MSP principles. Interactive user interface and links 
to graphics tools are provided to support the tool integration efforts. 
Application of STRIDE is demonstrated by an analysis of latchup trigger 
current dependence on layout arrangement, TCAD, Device Simulation, Parallel 
Iteractive Solver, Staggered Nonlinear Algorithms, CMOS Latchup. 
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applications 

Euro-Par 1 96 : parallel processing : Lyon, August 26-29, 1996 

SIMON J; WIERUM J M 

BOUGE Luc, ed; FRAIGNIAUD Pierre, ed; MIGNOTTE Anne, ed; ROBERT Yves, ed 
Paderborn Center for Parallel Computing -PC SUP 2 Fuerstenallee 11, 33095 
Paderborn, Germany 

International Euro-Par conference, 2 (Lyon FRA) 1996-08-26 
Journal: Lecture notes in computer science, 1996 , 1123 1509-1522 
ISSN: 0302-9743 Availability: INIST-16343; 354000063994311970 
No . of Ref s . : 16 ref . 

Document Type: P (Serial); C (Conference Proceedings) ; A (Analytic) 
Country of Publication: Germany; United States 
Language: English 

A performance prediction method is presented, which accurately predicts 
the expected program execution time on massively parallel systems. We 
consider distributed-memory architectures with SMD nodes and a fast 
communication network. The method is based on a relaxed task graph model, a 
queuing model, and a memory hierarchy model. The relaxed task graph is a 
compact representation of communicating processes of an application 
mapped onto the- target machine. Simultaneous accesses to the resources of 
a multi-processor node are modeled by a queuing network. The execution 

time of the application is computed by an evaluation algorithm. An 
example application implemented on a massively parallel computer 
demonstrates the high accuracy of our model. Furthermore, two applications 
of our accurate prediction method are presented. 

Copyright (c) 1997 INIST-CNRS. All rights reserved. 
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Abstract: The WK-recursive networks own two structural advantages: 

expansibility and equal degree. A network is expansible if no changes 
to node configuration and link connection are necessary when it is 
expanded, and of equal degree ii its nodes have the same degree no 
matter what the size is. However, the number oi nodes contained in a 
WK-recursive network is restricted to d(t) where d > 1 is the size of 
the basic building block and t greater than or equal to 1 is the level 
of expansion. The incomplete WK-recursive networks, which were proposed 
to relieve this restriction, are allowed to contain an arbitrary number 
of basic building blocks, while preserving the advantages of the 
WK-recursive networks. 

Designing shortest-path routing algorithms ion incomplete networks 
is in general more difficult than for complete networks. The reason is 
that most incomplete networks lack a unified representation. One of the 
contributions of this paper is to demonstrate a useful representation 
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, i.e., the multistage graph representation , for the incomplete 
WK-recursive networks. On the basis of it, a shortest-path routing 
algorithm is then proposed. With 0(d . t) time preprocessing, this 
algorithm lakes 0(t) time for each intermediate node to determine the 
next node along the shortest path. 
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415/ATHENS//GA/30602 
Journal: JOURNAL OF SUPERCOMPUTING, 1996 , V10, N3, P243-269 
ISSN: 0920-8542 

Language: ENGLISH Document Type: ARTICLE 

Abstract: A reconf igurable network termed as the reconf igurable multi-ring 
network (RMRN) is described. The RMRN is shown to be a truly scalable 
network in that each node in the network has a fixed degree of 
connectivity and the reconfiguration mechanism ensures a network 
diameter of O(log{2) N) for an N-processor network. Algorithms for the 
two-dimensional mesh and the SIMD or SPMD n-cube are shown to map very 
elegantly onto the RMRN. Basic message passing and reconfiguration 
primitives for the SIMD/SPMD RMRN are designed for use as building 
blocks for more complex parallel algorithms. The RMRN is shown to be a 
viable architecture for image processing and computer vision problems 
using the parallel computation of the stereocorrelation imaging 
operation as an example. Stereocorrelation is one of the most 
computationally intensive imaging tasks. It is used as a visualization 

tool in many applications, including remote sensing, geographic 
information systems and robot vision. 
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Title: A PARALLEL ALGORITHM FOR COMPUTING POLYGON SET OPERATIONS 
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Corporate Source: W VIRGINIA UNIV, CONCURRENT ENGN RES 
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Language: ENGLISH Document Type: ARTICLE 

Abstract : We present a parallel algorithm for performing boolean set 
operations on generalized polygons that have holes in them. The 
intersection algorithm has a processor complexity of 0(m(2)n(2)) 
processors and a time complexity of O(max(21og m, log(2) n) ) , where m 
is the maximum number of vertices in any loop of a polygon, and n is 
the maximum number of loops per polygon. The union and difference 
algorithms have a processor complexity of 0(m(2)n{2)) and time 
complexity of 0(log m) and O(21og m, log n) respectively. The algorithm 
is based on the EREW PRAM model. The algorithm tries to minimize the 
intersection point computations by intersecting only a subset of loops 
of the polygons, taking advantage of the topological structure of the 
two polygons. We believe this will result in better performance on 
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the average as compared to the worst case. Though all the algorithms 
presented here are deterministic, randomized algorithms such as sample 
sort can be used for the sorting subcomponent of the algorithms to 
obtain fast practical implementations. (C) 1995 Academic Press, Inc. 
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Application of MPP to particle tracking 
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AIP Conf. Proc; 297(1), 19-26 (25 DEC. 1993) CODEN: APCPC 
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Conference Title: Computational accelerator physics 
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Conference Year: 22-26 Feb 1993 
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The SSC requires massive simulation to support the design, 
commissioning, and operation of the accelerator complex. To this end, the 
laboratory has made a significant commitment to MPP for this application. 

A 64 node IPSC/860 was acquired in January of 1991 and has been used 
extensively in tracking studies of various accelerators of the SSC injector 
chain. This talk will detail the accomplishments to date and lessons 
learned. The most basic observation one can make about tracking on a 
parallel computer is that for a thin element kick code in the absence of 
space charge, the problem has a natural granularity that makes it 
1 1 embarrassingly parallel.' f One simply distributes the particles over 
available nodes and tracks. No intermode communication is required 
except for a small amount of diagnostic information that is generated as 
the run progresses. Hence, the parallel efficiency approaches 100 percent 
and the problem is scalable to a large number of processors. This 
seemingly trivial observation leads immediately to two important 
conclusions regarding the hardware configuration used to do the tracking. 

The number of computational nodes should not exceed the number of 
particles tracked and the overall performance of the calculation will be 
dominated by single node performance . The situation becomes less clear 
as more intermode communication is added. The performance of the MPP 
system on runs where beam emittance is monitored or beam instrumentation is 
simulated are progressively influenced by message passing overhead. In 
general, one must be aware that it is sometimes better to abandon the 
natural granularity and compromise network performance in the interests 
of optimizing individual node performance . The addition of space 

charge forces to the tracking code requires a PIC calculation to be done 
concurrently with the thin element tracking. A procedure for dynamically 
sorting particles on to nodes that optimizes machine performance will 
be described. The application of a MPP to serve as the engine of real 
time simulator will be discussed. Such factors as predictability of 
network collisions and the interrupt response time of the individual node 
required to write out data becomes important. An interactive 
visualization system designed to display the results from the space 
charge calculation will be described. It has great flexibility in choice 
of viewpoint, reference frame and data density. 
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03487204 Supplier Number: 47190601 (USE FORMAT 7 FOR FULLTEXT) 
SUN MOVES IN ON THE HIGH PERFORMANCE COMPUTING MARKET WITH THE ULTRA HPC 
SERVER LINE 

Computergram International, n3115, pN/A 
March 7, 1997 

Language: English Record Type: Fulltext 
Document Type: Newswire; Trade 
Word Count: 57 2 

(USE FORMAT 7 FOR FULLTEXT) 
TEXT: 

...CI No 3,021), Sun Microsystems Inc is pursuing Silicon Graphics Inc and 
other high-performance computing players by bundling its Ultra Enterprise 

SMP symmetric multi-processing servers with a raft of parallelising 
software, development tools and applications and. . . 

...machines, Sun will sell and support version 2.2 of Platform Computing 
Corp's popular Load Sharing Facility software for monitoring and managing 

resources , plus Fortran77, Fortran90, multi-threading development and 
debugging tools. By year-end it will introduce... 

...it acquired from Thinking Machines Corp, in the form of the Prism 
parallel debugging and visualisation tools. At least some of the 
clustering options will be provided by Sun's forthcoming... 

. . . GlobalWorks will enable developers to address a cluster of systems as a 
single virtual processing node . The lGbps SCI Sbus adaptor boards being 
created for clustering Sun servers by Dolphin Interconnect . . . 

. . .At the same time, Solaris is scheduled to support 64-bit virtual address 
space and cluster system management plus the cluster file system 
required to allow users to write their own clustered applications. The 
servers start . . . 
19970307 
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PARALLEL COMPUTING APPLICATIONS GROUP 
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Word Count: 116 

(USE FORMAT 7 FOR FULLTEXT) 
TEXT: 

...of Trade and Industry, to develop parallel computing applications for a 
computer architecture designed by parallel processing specialist Caplin 
Cybernetics, built around the new Inmos T9000 Transputer: the goal of the 
project . . . 

...limitations of existing parallel machines; the system will be designed 
to offer peak floating-point performance of 100 MFLOPS per node ; 
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planned applications for the general-purpose system include oil reservoir 
simulation, three -dimensional visualisation and neural network 
modelling. 
19910426 
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Word Count: 421 
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TEXT: 

...R3000 RISC chip, each with a tightly-coupled Intel 80860 RISC part 
acting as a vector co-processor . Each processor is rated by Stardent at 
32 MIPS and 48 MFLOPS giving an overall performance of 64 MIPS and 96 
MFLOPS. They run version 3 of the Application Visualisation System 
graphics subsystem from the Stellar side of the company - this uses two 
80860s for three-dimensional... 

...claimed to perform 190,000 three-dimensional vectors, and 40,000 
100-pixel gouraud-shaded triangles operations per second. Running a 
version of AT&T's Unix V.3 and the... 

...what it describes as the "world's first medical imaging supercomputer." 
The Stardent 3000VS Series Visualisation Systems are essentially the 
Ardent-based 3000 systems running Stellar 's VX graphics subsystem, using... 

...32MHz MIPS R3000 part, available in one to four processor 
configurations, offering a top-end performance of 128 MFLOPS going from 
$100,000 to $300,000. They are available as upgrades... 
19900927 
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WORD COUNT: 4 67 LINE COUNT: 00036 

...ABSTRACT: its $750,000 Connection Machine 5 Scale 3, a scalable 
supercomputer that combines pe'ak processing capacity of up to 4GFLOPS, 
the CMOST Unix operating system and integrated multiGbytes of file storage. 
The CM 5 Scale 3 also features multiple 9.6Gbyte disk storage nodes , up 
to 32 of Thinking Machines 1 128MFLOPS parallel processing nodes and 
the full set of Connection Machine software. CM 5 Scale 3 is the company... 

...to the company because such machines fit into the smaller configurations 
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where most of the parallel processing potential is located. Thinking 
Machines also introduces its first fully integrated version of the 
Application Visualization System, the CM AVS. Intended for parallel 
computers, CM AVS enables users to interactively visualize... 
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... Design/Computer Aided Manufacturing. It has a planned availability 

of October 25. o IBM POWER Visualization System: This total solution for 
scientific visualization combines advanced hardware, an integrated user 
environment optimized for visualization , and the latest in communications 
and storage technology to give scientists a new tool for their most 
challenging projects. Users access the power of the visualization system 
through a RISC System/6000 that functions as a visualization workstation. 
Planned availability is November 22, with prices ranging from $600,000 to 
$2 million depending on options selected. Elements of the system include: - 
the IBM POWER Visualization Server with up to 32 parallel processors 
that features the Data Explorer integrated visualization environment, 
which supports industry-standard X Window System and OSF/Motif interfaces - 
a dedicated RISC... 

...the optional IBM Disk Array Subsystem for holding the large amounts of 
data needed for visualization projects. This storage method, with a 
capacity of up to 170 gigabytes, speeds large blocks of data to the 
visualization server at a faster rate than conventional high-performance 
disk storage units. - the optional IBM POWER Visualization Video 
Controller, attached to the visualization workstation, which allows 
high-resolution images generated by the IBM POWER Visualization Server 
to be displayed at the workstation. This enables support for 
High-Definition Television (HDTV) displays. - High Performance Parallel 
Interface (HIPPI) networking capability, which allows data to be 
transferred among the visualization server , disk array and video 
controller five to 10 times faster than conventional workstation network 
channels. HIPPI also permits the visualization system to connect to 
supercomputers and mainframe computers, o IBM AIX Visualization Data 
Explorer/6000: This application software product allows a user to perform 
advanced visualization on a standalone RISC System/6000 workstation. Its 
flexible design allows both novice and expert... 

...render data through a rich set of functions compatible with those on the 
IBM POWER Visualization System. Planned availability is December 20, with 
a price of $5,900. o 9333 High-Perf ormance Disk Drive Subsystem: This 
product comes in two models one a deskside unit that attaches... 

. . . POWERserver 9XX systems. Both models feature a new Serial-Link 
connection capability to deliver improved performance . 

Up to four subsystems can be attached via a single adapter to provide 
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a total . . . 
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of Trade and Industry, to develop parallel computing applications 
for a computer architecture designed by parallel processing specialist 
Caplin Cybernetics, built around the new Inmos T9000 Transputer: the goal 
of the project . . . 

...limitations of existing parallel machines; the system will be designed 
to offer peak floating-point performance of 100 MFLOPS per node ; 
planned applications for the general-purpose system include oil reservoir 
simulation, three -dimensional visualisation and neural network 
modelling. 

- o - 

Interactive Systems Corp, whose appointment as principal publisher of 
Unix. . . 

19910426 

9 



4 May 10, 2000 11:25 



Ginger Roberts - Search Report 



?show files;ds 

File 15:ABI/INF0RM(R) 1971-2000/May 08 

(c) 2000 Bell & Howell 
File 88:Gale Group Business A.R.T.S. 1976-2000/May 10 

(c) 2000 The Gale Group 
File EBusiness & Industry(R) Jul/1 994-2000/May 10 

(c) 2000 Resp. DB Svcs . 
File 13:BAMP 2000/Apr W5 

(c) 2000 Resp. DB Svcs. 
File 623:Business Week 1985-2000/Apr W5 

(c) 2000 The McGraw-Hill Companies Inc 
File 810:Business Wire 198 6-1999/Feb 28 

(c) 1999 Business Wire 
File 610:Business Wire 1999-2000/May 10 

(c) 2000 Business Wire. 
File 647: CMP Computer Fulltext 1988-2000/Apr W5 

(c) 2000 CMP 

File 275:Gale Group Computer DB(TM) 1 983-2000/May 10 

(c) 2000 The Gale Group 
File 674:Computer News Fulltext 1989-2000/Mar W2 

(c) 2000 IDG Communications 
File 98:General Sci Abs/Full-Text 1 984-1999/Oct 

(c) 1999 The HW Wilson Co. 
File 47:Gale Group Magazine DB(TM) 1959-2000/May 10 

(c) 2000 The Gale group 
File 75:TGG Management Contents (R) 86-2000/Apr W5 

(c) 2000 The Gale Group 
File 239:Mathsci 1940-2000/ Jun 

(c) 2000 American Mathematical Society 

Set Items Description 

51 38027 (PARALLEL OR PIPELINE OR ARRAY OR VECTOR OR CONCURRENT? OR 

SIMULTANEOUS?) (2N) (PROCESSOR? ? OR PROCESSING OR SERVER) 

52 15005 HYPERCUBE? ? OR HYPER () CUBE? ? OR SMP OR MPP 

53 3127669 CAPACITY OR PERFORMANCE OR LOAD OR EXECUT? ( 2N) TIME? ? OR R- 



ESOURCE? ? OR THROUGHPUT OR THROUGH () PUT OR TRAFFIC OR CONCUR- 
RENCY OR BOTTLENECK? ? OR TRACE () TOOL? ? OR STATISTIC? ? OR W- 
ORKLOAD OR CLUSTER ( 2N) MANAG? OR DATA () HANDLING 



S4 


2265452 


GRAPH? OR VISUAL? OR PICTORIAL OR PICTURE OR 3()D OR THREE- 
() DIMENSIONAL OR 3D OR IMAGE OR IMAGES OR ILLUSTRATION OR X()Y 
OR XY OR MATRIX OR MATRICES 


S5 


335161 


NODE OR NODES OR VERTEX OR VERTICES OR CORNER OR TRIANGULAR 
OR TRIANGLE? ? OR CROSS () POINT? ? OR CROSSPOINT? ? OR FORK? ? 


S6 


3059 


(SI OR S2) AND S3 AND S4 AND S5 


S7 


182 


(SI OR 32) (S) S3 (S) S4 (S) S5 


S8 


38068 


S4(5N) (REPRESENTATION OR VISUALIZATION OR VISUALISATION) 


S9 


408 


(SI OR S2) AND S3 AND S5 AND S8 


S10 


353 


S9 AND PY<1998 


Sll 


225 


RD (unique items) 


S12 


10 


(SI OR S2) (S) S3 (S) S5 (S) S8 


S13 


9 


S12 AND PY<1998 


S14 


9 


RD (unique items) 


?tl4/3, k/all 





14/3 ,K/1 (Item 1 from file: 15) 

DIALOG (R) File 15 : ABI/ INFORM ( R) 

(c) 2000 Bell & Howell. All rts . reserv. 



01222933 98-72328 

Powerful on- campus computing for industry 

Falcao, Djalma M 

IEEE Spectrum v33n6 PP: 32 Jun 1996 



1 May 10, 2000 11:12 



Ginger Roberts - Search Report 



ISSN: 0018-9235 JRNL CODE: SPC 

ABSTRACT: The Laboratory for High- Performance Computing at the Federal 
University of Rio de Janeiro's Graduate School of Engineering focuses... 

... joint projects with the lab. The lab owns 3 different state-of-the-art 
high- performance parallel computers: an 8- node Intel iPSC/860 

hypercube computer, an 8-processor Cray J90, and a 4-processor IBM SP2 
system, as well as SunSparc20 workstations for graphics and 

visualization . 
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Silicon Graphics Intros Data Warehouse Mining Tools 

(Silicon Graphics unveils MineSet data analysis and mining tools , and 
Challenger DataArray, chain of supporting SGI servers) 

Newsbytes News Network, p N/A 
April 16, 1996 

DOCUMENT TYPE: Journal (United States) 
LANGUAGE: English RECORD TYPE: Fulltext 
WORD COUNT: 509 

ABSTRACT : 

...is based around the concept of a software backplane which will support 
data mining and visualization * plug-in tools from SGI and independent 
software vendors. The Challenge DataArray server cluster can be configured 
from two to eight Challenge nodes with each node supporting one to 
thirty-six MIPS RISC R4400 or R10000 processors, for a maximum of 288 
processors . The array also supports up to 128 gigabytes (GB) of system 
memory and 288 fast and wide. . . 

...38 terabytes non-RAID (redundant array of inexpensive disks) and 125 
terabytes of RAID disk capacity is possible with the Challenge Data 
Array. Shipment is expected in the first half of. . . 
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WORD COUNT: 608 LINE COUNT: 00052 

TEXT: 

...CI No 3,021), Sun Microsystems Inc is pursuing Silicon Graphics Inc 
and other high-performance computing players by bundling its Ultra 
Enterprise SMP symmetric multi-processing servers with a raft of 
parallelising software, development tools and applications and. . . 

...machines, Sun will sell and support version 2.2 of Platform Computing 
Corp's popular Load Sharing Facility software for monitoring and managing 

resources , plus Fortran77, Fortran90, multi-threading development and 
debugging tools. By year-end it will introduce... 
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...it acquired from Thinking Machines Corp, in the form of the Prism 
parallel debugging and visualisation tools. At least some of the 
clustering options will be provided by Sun's forthcoming... 

. . .GlobalWorks will enable developers to address a cluster of systems as a 
single virtual processing node . The lGbps SCI Sbus adaptor boards being 
created for clustering Sun servers by Dolphin Interconnect... 

. . .At the same time, Solaris is scheduled to support 64-bit virtual address 
space and cluster system management plus the cluster file system 
required to allow users to write their own clustered applications. The 
servers start... 

19970307 
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...ABSTRACT: its $750,000 Connection Machine 5 Scale 3, a scalable 
supercomputer that combines peak processing capacity of up to 4GFLOPS, 
the CMOST Unix operating system and integrated multiGbytes of file storage. 
The CM 5 Scale 3 also features multiple 9.6Gbyte disk storage nodes , up 
to 32 of Thinking Machines' 128MFLOPS parallel processing nodes and 
the full set of Connection Machine software. CM 5 Scale 3 is the company... 

...to the company because such machines fit into the smaller configurations 
where most of the parallel processing potential is located. Thinking 
Machines also introduces its first fully integrated version of the 
Application Visualization System, the CM AVS . Intended for parallel 
computers, CM AVS enables users to interactively visualize... 

19921200 
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Design/Computer Aided Manufacturing. It has a planned availability 
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of October 25. o IBM POWER Visualization System: This total solution for 
scientific visualization combines advanced hardware, an integrated user 
environment optimized for visualization , and the latest in communications 
and storage technology to give scientists a new tool for their most 
challenging projects. Users access the power of the visualization system 
through a RISC System/6000 that functions as a visualization workstation. 
Planned availability is November 22, with prices ranging from $600,000 to 
$2 million depending on options selected. Elements of the system include: - 
the IBM POWER Visualization Server with up to 32 parallel processors 
that features the Data Explorer integrated visualization environment, 
which supports industry-standard X Window System and OSF/Motif interfaces - 
a dedicated RISC... 

...the optional IBM Disk Array Subsystem for holding the large amounts of 
data needed for visualization projects. This storage method, with a 
capacity of up to 170 gigabytes, speeds large blocks of data to the 
visualization server at a faster rate than conventional high-performance 
disk storage units. - the optional IBM POWER Visualization Video 
Controller, attached to the visualization workstation, which allows 
high-resolution images generated by the IBM POWER Visualization Server 
to be displayed at the workstation. This enables support for 
High-Definition Television (HDTV) displays. - High Performance Parallel 
Interface (HIPPI) networking capability, which allows data to be 
transferred among the visualization server , disk array and video 
controller five to 10 times faster than conventional workstation network 
channels. HIPPI also permits the visualization system to connect to 
supercomputers and mainframe computers, o IBM AIX Visualization Data 
Explorer/6000: This application software product allows a user to perform 
advanced visualization on a standalone RISC System/6000 workstation. Its 
flexible design allows both novice and expert... 

...render data through a rich set of functions compatible with those on the 
IBM POWER Visualization System. Planned availability is December 20, with 
a price of $5,900. o 9333 High- Performance Disk Drive Subsystem: This 
product comes in two models — one a deskside unit that attaches... 

. . . POWERserver 9XX systems. Both models feature a new Serial-Link 
connection capability to deliver improved performance . 

Up to four subsystems can be attached via a single adapter to provide 
a total . . . 

19910729 
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... of Trade and Industry, to develop parallel computing applications 

for a computer architecture designed by parallel processing specialist 
Caplin Cybernetics, built around the new Initios T9000 Transputer: the goal 
of the project . . . 

...limitations of existing parallel machines; the system will be designed 
to offer peak floating-point performance of 100 MFLOPS per node ; 
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planned applications for the general-purpose system include oil reservoir 
simulation, three -dimensional visualisation and neural network 
modelling. 

- o - 

Interactive Systems Corp, whose appointment as principal publisher of 
Unix. . . 

19910426 
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R3000 RISC chip, each with a tightly-coupled Intel 80860 RISC part 
acting as a vector co-processor . Each processor is rated by Stardent at 
32 MIPS and 48 MFLOPS giving an overall performance of 64 MIPS and 96 
MFLOPS. They run version 3 of the Application Visualisation System 
graphics subsystem from the Stellar side of the company - this uses two 
80860s for three-dimensional... 

...claimed to perform 190,000 three-dimensional vectors, and 40,000 
100-pixel gouraud-shaded triangles operations per second. Running a 
version of AT&T's Unix V.3 and the... 

...what it describes as the "world's first medical imaging supercomputer." 
The Stardent 3000VS Series Visualisation Systems are essentially the 
Ardent-based 3000 systems running Stellar 's VX graphics subsystem, using... 

...32MHz MIPS R3000 part, available in one to four processor 
configurations, offering a top-end performance of 128 MFLOPS going from 
$100,000 to $300,000. They are available as upgrades... 

19900927 
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Byline: Joseph Maglitta 

Journal: Computerworld Page Number: 92 

Publication Date: July 15, 1996 

Word Count: 1008 Line Count: 100 

Publication Year: 1996 
Text : 

... 7, 000 IBM PCs and ThinkPads, 80 AS/400 servers, two RS/6000 SP 

massively parallel processors APPLICATIONS100 at 30 venues ON-SITE 
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USERS150, 000, including: 40,000 volunteers 31... 

... employees 15,000 athletes 15,000 members of the media 100 heads of 
state WEB TRAFFIC Before Games: About 250,000 visits daily During 
Games: More than 6 millionexpected daily... 

...to a secure DB2 database on an SP2. INTERNET SERVERSOne RS/6000 SP2 
scalable parallel processor with 52 nodes in Southbury, 

Conn. SECOND SERVEROne RS/6000 SP2 with 16 nodes in Hawthorne, 
N . Y . Each node has 250M to 512M bytes of memory and 4G bytes of DASD. 
Systems have Asynchronous. . . 

...WORKS: Satellite data for 29 square kilometers around the Games gets fed 
into a 30- node SP2 running IBM's visualization data explorer. 
TECHNOLOGY OPERATIONS CENTERMission control. No, you can't surf through 
here, either. . . 
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1978 , 

The authors derive upper and lower performance bounds for optimal 
graph-theoretic algorithms which use identical processors in parallel . 
Two models are considered: an unbounded number of parallel processors 
, and a number bounded by a constant K. The processors are capable of 
performing arithmetic. . . 

...the case of unbounded parallelism) as well as comparisons, accessing 
common memory which contains a graph adjacency representation as input. 
Using the fan-in theorem of J. I. Munro and M. Paterson (J... 

...lower bounds for serial computability, the authors show that the 
following tasks have unbounded parallel performance lower bounds of 
Omega (log n) : (1) finding connected components in an undirected graph; 
(2. . . 

. . .processors to a constant number of transitive closure computations, 
for which the best known unbounded parallel processing time is 
O(log{sup}2 n) . This establishes an upper bound. In the case... 

...T {sub} 1/K+L (log K)+2n, where L is the distance of the node farthest 
from the start node . In the case of dense graphs, the breadth-first 
search technique is therefore nearly optimal... 
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